Subject: The Lazy Bum's Metapackaging for Package Management (with patches)
To: None <tech-pkg@netbsd.org>
From: J Chapman Flack <flack@cs.purdue.edu>
List: tech-pkg
Date: 03/20/2005 16:44:22
Hi,

I'm wondering if anybody would like to pick apart the bsd.pkg.mk patches
attached below.  They make bin-install do more what I always thought it
would do, but I don't know if they'd change behavior somebody else relies
on.

I'd been thinking for a while of writing some bare-bones script that would
contain the list of packages I want on a system ... something only a little
more sophisticated than for i in foo bar baz; do cd $i&&make bin-install; done.
Something that would speed setting up a new system, or just provide a fast
reliable scorch-and-bin-install-everything-again solution when some tiny
version bump propagates everywhere.

It finally dawned on me that what I really wanted wasn't to write my own script
but to make the best use of the pkgsrc tools by writing a metapackage - nothing
but a /usr/pkgsrc/local/mymachine/Makefile with DEPENDS+= all the packages I
care about, and roughly nothing else.  Then one 'make bin-install' finds all of
those and their prerequisites, using binaries where available and building when
needed. Adding a new package is just adding a DEPENDS+= line and make deinstall
bin-install, where deinstall only removes the (empty) mymachine package, and
bin-install detects the missing new package and adds it, and the Makefile is the
up-to-date record of the packages wanted.  Scorching the earth and reinstalling
is just pkg_delete mymachine && make bin-install.  pkg_info -S mymachine is
the total size of wanted packages and prerequisites, and anything shown by
pkg_info -R that isn't required by mymachine was a build-prerequisite no
longer needed.  And the mymachine Makefile is short enough even to type in
when setting up a new machine, then type make bin-install and watch it go.

It was easy enough to get started, but I wound up with some bsd.pkg.mk
patches to suggest that make it work more smoothly.

The first thing I noticed was that 'make bin-install' would say 'Binary
install for mymachine' and then immediately start *source* builds for all
the prerequisites.  That one was easy:

! .  elif make(bin-install)
  DEPENDS_TARGET=	bin-install
  .  else
  DEPENDS_TARGET=	reinstall
--- 1098,1104 ----
  .    else
  DEPENDS_TARGET=	update
  .    endif
! .  elif make(bin-install)  ||  make(real-su-bin-install)
  DEPENDS_TARGET=	bin-install

The bin-install target immediately reinvokes make at real-su-bin-install,
and the child forgot to set DEPENDS_TARGET appropriately.

Next, it turned out that having some package tgz's cached in
/usr/pkgsrc/packages/All was actually harmful - if all their prerequisites
were not also cached there, they would not be looked for in BINPKG_SITES,
and the installation would fail.

Turned out the real-su-bin-install script was doing its own unnecessary search
for the package in PKGREPOSITORY and BINPKG_SITES--one by one--and then giving
pkg_add a PKG_PATH with only the first location found, which guaranteed failure
if some of that package's @pkgdep's had to be found somewhere else.  I
shortened and simplified that script section to simply construct PKG_PATH
out of PKGREPOSITORY and BINPKG_SITES and hand off to pkg_add for the search.

The final surprise was that, given DEPENDS+= foo>=1.2nb2:../../bar/foo
and foo's Makefile provides version 1.2nb3, even if a 1.2nb2 binary exists
and would satisfy the requirement it is ignored and foo gets built from source.
I was wondering how hard it would be to add a variable that records the
package pattern actually required, so the source build can be avoided as long
as there's a binary that matches the pattern - and then I saw that exactly
the variable I wanted was already there: PKGNAME_REQD, which bsd.pkg.mk set
but never used.  Now that I had pkg_add already handling the search in
real-su-bin-install, and pkg_add understands package patterns, it was trivial
to pass it the pattern PKGNAME_REQD if it is defined, otherwise the exact
PKGNAME from the Makefile as before.  So now bin-install will satisfy
prerequisites with a binary package as long as there is any available that
meets the requirement, and only build from source if there isn't...and now
there's an answer to the question "what's this variable used for?"  :)

  note: to make effective use of PKGNAME_REQD, I backed out an earlier change
  (1.1300) that put apostrophe characters into the package name.  apparently
  there is at least one platform where make (I'm guessing) does broken quoting,
  and putting apostrophes in the package name made the broken platform work.
  It'd be possible to do a workaround of a workaround in bin-install, but it
  would be nicer to find out what's really broken in the broken platform, and
  then maybe there's a more targeted solution.

This doesn't change the behavior if you explicitly cd to a package directory
and type make bin-install - then (because you haven't defined PKGNAME_REQD)
you are assured of getting the current version of the package, built while
you wait (and wait, and wait) if it isn't already available.  But for
bin-installing several packages you care about that have a whole bunch of
prerequisites you don't care about as long as they're recent enough to work,
you now get the prerequisites quickly installed from any sufficiently recent
package if possible.  That's the way I'd always hoped bin-install would work.

There is one wrinkle my current patches don't address: if foo has a binary
package but one of its prerequisites bar does not, then the pkg_add of foo
gives up completely, and foo gets built from source.  A stopgap solution
would be to change the default behavior (perhaps with an option?) from
  pkg_add || make package clean
to
  pkg_add || make install-depends && { pkg_add || make package clean; }

where the install-depends should take care of all the prerequisites (including
building the missing ones) and the second pkg_add should succeed (unless foo
itself has no binary package, and then the make package makes sense).  I just
haven't done that yet.  It would be easy but has the drawback that all foo's
*build* prerequisites get installed too, unnecessarily.  Factoring out an
install-run-depends target would take care of that.

---- a slicker way, possible future work ----
A slicker solution would look at the root of the problem: the Makefiles and
pkg_add have their own approaches to installing prerequisites, and changing
circumstances like the presence or absence of certain package versions on the
net outside your control influence the semantics you get.  pkg_add could be
given some private options or environment variables to know when it is being
invoked by the makefiles, and give it a way to call back on make package for
an individual prerequisite it can't find as a binary.  Then the expected
semantics of "get everything in binary that you can, and build what you can't"
would be preserved right through pkg_add without any failing and restarting.

I can't propose to write it any time soon, but here's a way I could see it
working.  pkg_add is passed (by a special option or variable) the value of
DEPENDS - exactly the list of pattern:dir pairs.  Now pkg_add has two sources
of dependency information: the DEPENDS list, and the @pkgdep information in a
binary package.  The binary package might be down a version or so, so these
lists might not be identical.  Here are three cases for how bar might appear
as a prerequisite of foo:

  bar>=1.2.2 is in @pkgdep, and no bar entry in DEPENDS.  pkg_add just does
  what it would normally do (so this behavior naturally reverts to the current
  behavior whenever there is no DEPENDS list, as when pkg_add is invoked
  standalone).  If no matching binary package can be found, pkg_add gives up,
  as now; even if it wanted to run make it doesn't know the directory for
  package bar.
  
  bar>=1.2.3 is in DEPENDS, and no bar entry in @pkgdep.  pkg_add ignores it.
  If bar is needed by some later version of foo than the binary one being
  installed, why bother?
  
  bar>=1.2.2 is in @pkgdep *and* bar>=1.2.3 is in DEPENDS.  Here pkg_add should
  only install a binary package that's a match for both patterns - in this
  case anything >=1.2.3, though there could be cases where the intersection
  is less obvious, or is empty.  If bar is needed to install the binary foo
  now, and would also be needed to build foo from source later, may as well
  make the effort not to install a version that would conflict with the other
  purpose.  If nothing suitable is found, pkg_add should fork, cd to the
  directory in the DEPENDS entry, and invoke 'make package clean' using a
  command string passed as another special option or variable (containing
  the MAKEFLAGS and so on).
---- end slicker future work ----

So, those are the ideas, here's the patch.  Please let me know if you see
things it would hurt.

Thanks,
-Chap

*** /usr/pkgsrc/mk/bsd.pkg.mk	Wed Mar  2 22:08:20 2005
--- /usr/pkgsrc/mk/bsd.pkg.mk	Sun Mar 20 16:47:13 2005
***************
*** 1098,1104 ****
  .    else
  DEPENDS_TARGET=	update
  .    endif
! .  elif make(bin-install)
  DEPENDS_TARGET=	bin-install
  .  else
  DEPENDS_TARGET=	reinstall
--- 1098,1104 ----
  .    else
  DEPENDS_TARGET=	update
  .    endif
! .  elif make(bin-install)  ||  make(real-su-bin-install)
  DEPENDS_TARGET=	bin-install
  .  else
  DEPENDS_TARGET=	reinstall
***************
*** 3825,3830 ****
--- 3825,3839 ----
  
  # List of sites carrying binary pkgs. Variables "rel" and "arch" are
  # replaced with OS release ("1.5", ...) and architecture ("mipsel", ...)
+ # XXX the way this variable is used in the shell script, all shell
+ # metacharacters are active, not just variable substitution.  I first
+ # changed it to do all substitution in make, just on the fixed tokens
+ # ${rel} and ${arch}, and pass the result :Q-ed to the shell.  I liked
+ # that much better, but realized it might change semantics for people who
+ # already had funny stuff in their BINPKG_SITES and had had to figure out
+ # how to quote it successfully.  So back to the original semantics - but
+ # worth thinking about if a pure string substitution limited to arch and
+ # rel would be less error prone.
  BINPKG_SITES?= \
  	ftp://ftp.NetBSD.org/pub/NetBSD/packages/$${rel}/$${arch}
  
***************
*** 3837,3842 ****
--- 3846,3854 ----
  _BIN_INSTALL_FLAGS=	${BIN_INSTALL_FLAGS}
  _BIN_INSTALL_FLAGS+=	${PKG_ARGS_ADD}
  
+ _SHORT_UNAME_R!=	uname -r
+ _SHORT_UNAME_R:=	${_SHORT_UNAME_R:C@\.([0-9])*[_.].*@.\1@} # n.n[_.]anything => n.n
+ 
  # Install binary pkg, without strict uptodate-check first
  .PHONY: real-su-bin-install
  real-su-bin-install:
***************
*** 3848,3879 ****
  		${SHCOMMENT} ${ECHO_MSG} "*** or use \`\`${MAKE} bin-update'' to upgrade it and all of its dependencies."; \
  		exit 1;							\
  	fi
! 	@if [ -f ${PKGFILE} ] ; then 					\
! 		${ECHO_MSG} "Installing from binary pkg ${PKGFILE}" ;	\
! 		${PKG_ADD} ${_BIN_INSTALL_FLAGS} ${PKGFILE} ;		\
  	else 				 				\
! 		rel=`${UNAME} -r | ${SED} 's@\.\([0-9]*\)[\._].*@\.\1@'`; \
! 		arch=${MACHINE_ARCH}; 					\
! 		for site in ${BINPKG_SITES} ; do 			\
! 			${ECHO} Trying `eval ${ECHO} $$site`/All ; 	\
! 			${SHCOMMENT} ${ECHO} ${SETENV} PKG_PATH="`eval ${ECHO} $$site`/All" ${PKG_ADD} ${_BIN_INSTALL_FLAGS} ${PKGNAME}${PKG_SUFX} ; \
! 			if ${SETENV} PKG_PATH="`eval ${ECHO} $$site`/All" ${PKG_ADD} ${BIN_INSTALL_FLAGS} ${PKGNAME}${PKG_SUFX} ; then \
! 				${ECHO} "${PKGNAME} successfully installed."; \
! 				break ; 				\
! 			fi ; 						\
! 		done ; 							\
! 		if ! ${PKG_INFO} -qe "${PKGNAME}" ; then 		\
! 			${SHCOMMENT} Cycle through some FTP server here ;\
! 			${ECHO_MSG} "Installing from source" ;		\
! 			${MAKE} ${MAKEFLAGS} package 			\
! 				DEPENDS_TARGET=${DEPENDS_TARGET:Q} &&	\
! 			${MAKE} ${MAKEFLAGS} clean ;			\
! 		fi ; \
  	fi
  
  .PHONY: bin-install
  bin-install:
! 	@${ECHO_MSG} "${_PKGSRC_IN}> Binary install for ${PKGNAME}"
  	${_PKG_SILENT}${_PKG_DEBUG}					\
  	realtarget="real-su-bin-install";				\
  	action="binary install";					\
--- 3860,3884 ----
  		${SHCOMMENT} ${ECHO_MSG} "*** or use \`\`${MAKE} bin-update'' to upgrade it and all of its dependencies."; \
  		exit 1;							\
  	fi
! 	@rel=${_SHORT_UNAME_R:Q} ; \
! 	arch=${MACHINE_ARCH:Q} ; \
! 	pkgpath=${PKGREPOSITORY:Q} ; \
! 	for i in ${BINPKG_SITES} ; do pkgpath="$$pkgpath;$$i/All" ; done ; \
! 	${ECHO} "Trying $$pkgpath" ; 	\
! 	if ${SETENV} PKG_PATH="$$pkgpath" ${PKG_ADD} ${BIN_INSTALL_FLAGS} ${PKGNAME_REQD:U${PKGNAME}:Q} ; then \
! 		${ECHO} ${PKGNAME_REQD:U${PKGNAME}:Q} successfully installed.; \
! 		break ; 				\
  	else 				 				\
! 		${SHCOMMENT} Cycle through some FTP server here ;\
! 		${ECHO_MSG} "Installing from source" ;		\
! 		${MAKE} ${MAKEFLAGS} package 			\
! 			DEPENDS_TARGET=${DEPENDS_TARGET:Q} &&	\
! 		${MAKE} ${MAKEFLAGS} clean ;			\
  	fi
  
  .PHONY: bin-install
  bin-install:
! 	@${ECHO_MSG} "${_PKGSRC_IN}> Binary install for "${PKGNAME_REQD:U${PKGNAME}:Q}
  	${_PKG_SILENT}${_PKG_DEBUG}					\
  	realtarget="real-su-bin-install";				\
  	action="binary install";					\
***************
*** 3973,3983 ****
  .    else	# !DEPENDS
  .      for dep in ${DEPENDS} ${BUILD_DEPENDS}
  	${_PKG_SILENT}${_PKG_DEBUG}					\
! 	pkg="${dep:C/:.*//}";						\
! 	dir="${dep:C/[^:]*://:C/:.*$//}";				\
  	found=`${PKG_BEST_EXISTS} "$$pkg" || ${TRUE}`;			\
  	if [ "X$$REBUILD_DOWNLEVEL_DEPENDS" != "X" ]; then		\
! 		pkgname=`cd $$dir ; ${MAKE} ${MAKEFLAGS} show-var VARNAME=PKGNAME`; \
  		if [ "X$$found" != "X" -a "X$$found" != "X$${pkgname}" ]; then \
  			${ECHO_MSG} "ignoring old installed package \"$$found\""; \
  			found="";					\
--- 3978,3988 ----
  .    else	# !DEPENDS
  .      for dep in ${DEPENDS} ${BUILD_DEPENDS}
  	${_PKG_SILENT}${_PKG_DEBUG}					\
! 	pkg=${dep:C/:.*//:Q};						\
! 	dir=${dep:C/[^:]*://:C/:.*$//:Q};				\
  	found=`${PKG_BEST_EXISTS} "$$pkg" || ${TRUE}`;			\
  	if [ "X$$REBUILD_DOWNLEVEL_DEPENDS" != "X" ]; then		\
! 		pkgname=`cd "$$dir" ; ${MAKE} ${MAKEFLAGS} show-var VARNAME=PKGNAME`; \
  		if [ "X$$found" != "X" -a "X$$found" != "X$${pkgname}" ]; then \
  			${ECHO_MSG} "ignoring old installed package \"$$found\""; \
  			found="";					\
***************
*** 4002,4012 ****
  		${ECHO_MSG} "${_PKGSRC_IN}> Required package $$pkg: NOT found"; \
  		target=${DEPENDS_TARGET:Q};				\
  		${ECHO_MSG} "${_PKGSRC_IN}> Verifying $$target for $$dir"; 	\
! 		if [ ! -d $$dir ]; then					\
  			${ECHO_MSG} "=> No directory for $$dir.  Skipping.."; \
  		else							\
! 			cd $$dir ;					\
! 			${SETENV} _PKGSRC_DEPS=", ${PKGNAME}${_PKGSRC_DEPS}" ${MAKE} ${MAKEFLAGS} $$target PKGNAME_REQD=\'$$pkg\' || exit 1; \
  			${ECHO_MSG} "${_PKGSRC_IN}> Returning to build of ${PKGNAME}"; \
  		fi;							\
  	fi
--- 4007,4017 ----
  		${ECHO_MSG} "${_PKGSRC_IN}> Required package $$pkg: NOT found"; \
  		target=${DEPENDS_TARGET:Q};				\
  		${ECHO_MSG} "${_PKGSRC_IN}> Verifying $$target for $$dir"; 	\
! 		if [ ! -d "$$dir" ]; then					\
  			${ECHO_MSG} "=> No directory for $$dir.  Skipping.."; \
  		else							\
! 			cd "$$dir" ;					\
! 			${SETENV} _PKGSRC_DEPS=", ${PKGNAME}${_PKGSRC_DEPS}" ${MAKE} ${MAKEFLAGS} $$target PKGNAME_REQD="$$pkg" || exit 1; \
  			${ECHO_MSG} "${_PKGSRC_IN}> Returning to build of ${PKGNAME}"; \
  		fi;							\
  	fi
***************
*** 4326,4332 ****
  	| ${SORT} -u							\
  	| ${SED} -e "s/'/'\\\\''/g" -e "s/.*/'&'/"			\
  	| ${XARGS} -n 256 ${LS} -ld					\
! 	| ${AWK} '{ s += $$5; } END { print s; }'			\
  
  # Sizes of required pkgs (only)
  #
--- 4331,4337 ----
  	| ${SORT} -u							\
  	| ${SED} -e "s/'/'\\\\''/g" -e "s/.*/'&'/"			\
  	| ${XARGS} -n 256 ${LS} -ld					\
! 	| ${AWK} '{ s += $$5; } END { print 0 + s; }'			\
  
  # Sizes of required pkgs (only)
  #