Subject: Proof-of-concept: upgrading binary packages
To: None <tech-pkg@NetBSD.org>
From: Havard Eidnes <he@NetBSD.org>
List: tech-pkg
Date: 02/03/2006 00:26:32
----Next_Part(Fri_Feb__3_00_26_32_2006_605)--
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

I've for a few days on and off put together a prof-of-concept
script which appears to be able to upgrade a set of binary
packages on a system using a set of packages from a remote
package repository.

The script depends on

 o the repository having been prepared in advance with a script
   which deposits a "binpkg-map" file in the $ARCH/All/
   directory, of the form

py23-reportlab                           1.19nb1              V
py23-reportlab 1.19nb1 py23-xml-0.8.3nb2 B
py23-reportlab 1.19nb1 py23-xml>=3D0.8.3nb1 D
py23-reportlab 1.19nb1 python23-2.3.4nb7 B
py23-reportlab 1.19nb1 python23>=3D2.3.3nb2 D
ap-ruby18                                1.2.4                V
ap-ruby18 1.2.4 apache-1.3.33nb1 B
ap-ruby18 1.2.4 apache{,6}-1.3.* D
ap-ruby18 1.2.4 ruby18-eruby-1.0.5nb1 B
ap-ruby18 1.2.4 ruby18-eruby>=3D0.9.7 D
ap-ruby18 1.2.4 ruby18-1.8.1nb2 B
ap-ruby18 1.2.4 ruby18>=3D1.8.1 D
ap-ruby18 1.2.4 ap2-ruby* C

   ftp://securitate.uninett.no/pub/NetBSD/packages/pkgsrc-2005Q4-200601=
19/NetBSD-2.0.2_STABLE/i386/All

   is a repository which has been thus prepared.

 o is written in perl (so obviously it needs to be installed); the
   dewey code is borrowed / reimplemented from the pkg_install
   library

 o depends on audit-packages, pkg_tarup, pkgdepgraph

 o the process grows to 10MB, mostly due to keeping all the above
   version and dependency information in memory

 o may end up being interactive, when several major versions of a
   base package name is available (emacs, suse_compat etc.), and the
   selection of which to prefer hasn't already been recorded.

 o is incomplete; before it barges ahead and deletes existing
   packages, it should probably do a consistency check of internal
   conflicts and dependencies among the new set of selected packages,
   and abort if any problems are found, so that you don't end up with
   a partially updated set of packages

 o probably embeds too much policy as hardcoded

 o updates all packages; could have option to just touch packages
   flagged by audit-packages?

 o does not rely on pkg_add's built-in "automatically pull in all
   dependencies" machinery -- first off, if there's trouble with the
   ftp server or the network, there is value in fetching all the
   binaries before deletion starts.

 o Could be cleverer in what it transfers (it now transfers everything
   each go-around; store checksums in the binpkg-map file?)


I'm at this point basically wondering whether some of the overall
features this script implements is something we should transform into
something we can re-code in C and include in pkg_install?

(I've so far not yet run the two last stanzas of the script, after the
exit(0), but at least the delete_order file *looks* fine, as does the
re-add script, and I've used pkgdepgraph in the past to good effect
doing source-based upgrades...)


Comments welcome.


Regards,

- H=E5vard

----Next_Part(Fri_Feb__3_00_26_32_2006_605)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="pkg-update.pl"

#! /usr/bin/perl

# $Id: pkg-update.pl,v 1.3 2006/02/02 23:25:44 he Exp $

#
# Upgrade the installed packages on the installed system.
# Depends on pkg-ver-map.pl having been run on the repository,
# to export the binary package versions, dependencies and conflicts
# to users of the repository.
#
# Repository typically given via PKG_PATH or -r option.
#

use Getopt::Std;
# use Text::Glob;


sub warn {
    my($l) = @_;

    printf(STDERR "%s", $l);
}

#
# Pkgsrc version number handling
#

%modifier_value = (
    "alpha" => -3,
    "beta" => -2,
    "pre" => -1,
    "RC" => -1,
    "pl" => 0,
    "_" => 0,
    '\.' => 0,
);

sub mkversion {
    my($vstr) = @_;
    my($v) = {};

    $$v{version} = ();
    $$v{netbsd} = 0;

  OUTER:
    while ($vstr ne "") {
	if ($vstr =~ /^(\d+)(.*)/) {
	    push(@{$$v{version}}, $1);
	    $vstr = $2;
	    next OUTER;
	}
	for my $mk (keys %modifier_value) {
	    if ($vstr =~ /^$mk(.*)/) {
		push(@{$$v{version}}, $modifier_value{$mk});
		$vstr = $1;
		next OUTER;
	    }
	}
	if ($vstr =~ /^nb(\d+)(.*)/) {
	    $$v{netbsd} = $1;
	    $vstr = $2;
	    next OUTER;
	}
	if ($vstr =~ /^([a-z])(.*)/) {
	    push(@{$$v{version}}, ord($1) - ord('a') + 1);
	    $vstr = $2;
	    next OUTER;
	}
    }
    return $v;
}

sub max {
    my($m) = undef;

    for my $v (@_) {
	if (!defined($m)) {
	    $m = $v;
	    next;
	}
	if ($v > $m) {
	    $m = $v;
	}
    }
    return $m;
}

sub cmp_result {
    my($v, $op) = @_;

    if ($op eq "<="){ return $v <= 0;}
    if ($op eq "<") { return $v < 0; }
    if ($op eq ">="){ return $v >= 0;}
    if ($op eq ">") { return $v > 0; }
    if ($op eq "==") { return $v == 0;}
    if ($op eq "!="){ return $v != 0;}
    return 0;
}

sub vtest {
    my($a, $cmp_op, $b) = @_;
    my($l) = &max(scalar @{$$a{version}}, scalar @{$$b{version}});
    my($av, $bv, $cmp);

    for (my $i = 0; $i < $l; $i++) {
	if (!defined($$a{version}[$i])) {
	    $av = 0;
	} else {
	    $av = $$a{version}[$i];
	}
	if (!defined($$b{version}[$i])) {
	    $bv = 0;
	} else {
	    $bv = $$b{version}[$i];
	}
	if (($cmp = $av - $bv) != 0) {
	    return &cmp_result($cmp, $cmp_op);
	}
    }
    return &cmp_result($$a{netbsd} - $$b{netbsd}, $cmp_op);
}

sub dewey_major {
    my($version) = @_;

    my($v) = &mkversion($version);
    return ${$$v{version}}[0];
}

sub dewey_cmp {
    my($lhs, $cmp_op, $rhs) = @_;
    my($vl, $vr);
    
    $vl = &mkversion($lhs);
    $vr = &mkversion($rhs);

    return &vtest($vl, $cmp_op, $vr);
}

@cmp_ops = ("<=", "<", ">=", ">", "==", "!=");


sub split_pattern {
    my($pat) = @_;

    foreach my $op (@cmp_ops) {
	if ($pat =~ /(.*)$op(.*)/) {
	    return ($1, $op, $2);
	}
    }
    return undef;
}

sub dewey_match {
    my($pattern, $pkg) = @_;
    my($pkgname, $pkgversion);
    my($pname, $op, $pver);

    if ($pkg =~ /(.*)-([^-]+)$/) {
	$pkgname = $1;
	$pkgversion = $2;
    } else {
	return 0;
    }
    ($pname, $op, $pver) = &split_pattern($pattern);
    if ($pname ne $pkgname) { return 0; }
    if ($op eq ">" || $op eq ">=") {
	if ($pver =~ /">"/) {
	    my($op2, $pver2);
	    ($pver, $op2, $pver2) = &split_pattern($pver);
	    if (! &dewey_cmp($pkgversion, $op2, $pver2)) {
		return 0;
	    }
	}
    }
    return &dewey_cmp($pkgversion, $op, $pver);
}

# Emulate csh alternative matching using {,} syntax
# Only handles a single level of nesting

sub alternate_match {
    my($pattern, $pkg) = @_;
    my($base, $alts, $rest);
    my(@alts) = ();

    if ($pattern !~ /^([^{]*)\{([^}]*)\}(.*)$/) {
	return undef;
    }
    $base = $1;
    $alts = $2;
    $rest = $3;
    @alts = split(/,/, $alts);
    if ($pattern =~ /,\}/) {
	push(@alts, "");	# split eats empty trailing fields -- compensate
    }
    
    foreach my $alt (@alts) {
	if (&pmatch($base . $alt . $rest, $pkg)) {
	    return 1;
	}
    }
    return 0;
}

sub pmatch {
    my($pattern, $pkg) = @_;

    if ($pattern =~ /\{/) {
	return &alternate_match($pattern, $pkg);
    }
    if ($pattern =~ /[<>]/) {
	return &dewey_match($pattern, $pkg);
    }
    if ($pattern =~ /[*?\[\]]/) {
#	return &Text::Glob::match_glob($pattern, $pkg);
	$pattern =~ s/\*/.*/g;	# we cheat for now
	return $pkg =~ /$pattern/;
    }
    return $pattern eq $pkg;
}


#
# 
#

sub pkgversion {
    my($pkg) = @_;
    my($pkgname, $version);

    if ($pkg =~ /(.*)-(\d.*)/) {
	return ($1, $2);
    }
    &warn("Could not split $pkg into pkg + version");
    return undef;
}

sub get_installed_packages {
    my($oldpath);

    open(IN, "pkg_info|") || die "$0: Could not do pkg_info: $!";
    while(<IN>) {
	chomp; split;
	my($pkg, $version) = &pkgversion($_[0]);
	$installed{$pkg} = $version;
    }
    close(IN);
}

sub get_repository_packages {
    my($rep) = @_;

    open(IN, "ftp -V -o - $rep" . "/binpkg-map </dev/null |") ||
	die "$0: Could not open remote binpkg-map";
    while(<IN>) {
	chomp; split;
	if (/V$/) {
	    if (! defined($rep_ver{$_[0]})) {
		$rep_ver{$_[0]} = ();
	    }
	    push(@{$rep_ver{$_[0]}}, $_[1]);
	    next;
	}
	if (/B$/) {
	    my($pkg, $ver) = ($_[0], $_[1]);
	    my($k) = $pkg . "-" . $ver;
	    if (! defined($rep_builddep{$k})) {
		$rep_builddep{$k} = ();
	    }
	    push(@{$rep_builddep{$k}}, $_[2]);
	    next;
	}
	if (/C$/) {
	    my($pkg, $ver) = ($_[0], $_[1]);
	    my($k) = $pkg . "-" . $ver;
	    if (! defined($rep_conflicts{$k})) {
		$rep_conflicts{$k} = ();
	    }
	    push(@{$rep_conflicts{$k}}, $_[2]);
	    next;
	}
	if (/D$/) {
	    my($pkg, $ver) = ($_[0], $_[1]);
	    my($k) = $pkg . "-" . $ver;
	    if (! defined($rep_depends{$k})) {
		$rep_depends{$k} = ();
	    }
	    push(@{$rep_depends{$k}}, $_[2]);
	    next;
	}
    }
    close(IN);
}


#
# For testing, essentially re-implement audit-packages
#

sub read_vulns {
    my($fn) = @_;

    open(IN, $fn) || die "Could not open $fn: $!";
    while(<IN>) {
	if (/^\#/) { next; }
	chomp; split;
	push(@vuln_patterns, $_[0]);
	push(@vuln_type, $_[1]);
	push(@vuln_url, $_[2]);
	my($num,$type) = split(/,/, $_[1]);
	push(@vuln_id, $num);
    }
    close(IN);
}

sub emit_vulnerables {

    for (my $i = 0; $i <= $#vuln_patterns; $i++) {
	my $pat = $vuln_patterns[$i];
	my $id = $vuln_id[$i];
	for my $pkg (keys %installed) {
	    if (&pmatch($pat, $pkg)) {
		printf("%-30s %6d %s\n", $pkg, $id, $pat);
	    }
	}
    }
}

# Print mismatches between installed and repository versions
# with output of the form produced by "lintpkgsrc -i", but without
# use of the pkgsrc source tree.

sub print_mismatches {
    my($fh, $misref, $mismatchref) = @_;

    foreach my $k (keys %installed) {
	if (!defined $rep_ver{$k}) {
	    if (!defined($misref)) {
		$misref = ();
	    }
	    push(@{$misref}, $k);
	}
    }

    foreach my $k (keys %installed) {
	my($rv) = join(",", @{$rep_ver{$k}});
	if ($rv ne $installed{$k}) {
	    printf($fh "Version mismatch: '%s' %s vs %s\n",
		   $k, $installed{$k}, $rv);
	    if (!defined($mismatchref)) {
		$mismatchref = ();
	    }
	    push(@{$mismatchref}, $k);
	}
    }
}

# pkg_tarup the given installed packages, save in $old_pkgs directory

sub save_installed_pkgs {
    my(@pkgs) = @_;

    foreach my $pkg (@pkgs) {
	my($v) = $installed{$pkg};
	system("pkg_tarup -d " . $old_pkgs . " " . $pkg . "-" . $v) == 0 ||
	    die "Could not save $pkg package: $!";
    }
}

# read output from "pkgdepgraph -D", return list of packages (sans version)

sub read_delete_list {
    my($lref) = @_;

    open(IN, "/tmp/delete_order") ||
	die "Could not open /tmp/delete_order: $!";
    while(<IN>) {
	chomp; split;
	my($pkg, $ver) = &pkgversion($_[0]);
	if (! defined($lref)) {
	    $lref = ();
	}
	push(@{$lref}, $pkg);
    }
    close(IN);
}

$select = "/usr/pkgsrc/packages/ver-sel";

sub read_selections {

    open(SEL, $select) || return;
    while(<SEL>) {
	chomp; split;
	$selected_major{$_[0]} = $_[1];
    }
    close(SEL);
    printf("Selection of pkg majors read from $select\n");
}

sub save_selections {

    open(SEL, ">" . $select) || die "Could not write $select : $!";
    foreach my $pkg (keys %selected_major) {
	printf(SEL "%s %s\n", $pkg, $selected_major{$pkg});
    }
    close(SEL);
    printf("Selection of pkg majors saved to $select\n");
}

sub highest_version {
    my(@vl) = @_;

    my $m = 0;

    foreach $v (@vl) {
	if (&dewey_cmp($v, ">", $m)) {
	    $m = $v;
	}
    }
    return $m;
}

sub best_major {
    my($maj, @vers) = @_;
    my(@cand) = ();

    foreach my $v (@vers) {
	if (&dewey_cmp($maj, "<", $v)) {
	    push(@cand, $v);
	}
    }
    my $v = &highest_version(@cand);
}

sub same_major {
    my(@vl) = @_;
    my($m);

    if (scalar(@vl) == -1) { return 0; }
    
    for (my $i = 0, $m = &dewey_major($vl[0]); $i <= $#vl; $i++) {
	if (&dewey_major($vl[$i]) != $m) {
	    return 0;
	}
    }
    return 1;
}

sub manual_select {
    my($pkg, @versions) = @_;

    select(STDOUT);
    $| = 1;			# we're interactive

    while(1) {
	printf("Package %s available in multiple major versions:\n");
	my $i = 1;
	foreach my $v (@versions) {
	    printf(" %d) %s\n", $i, $v);
	    $i++;
	}
	printf("Please select one by giving its number: ");
	my $resp = <>;
	if (defined($versions[$resp-1])) {
	    return $versions[$resp-1];
	} else {
	    printf("Invalid selection $resp, please try again.\n\n");
	}
    }
}

sub choose_ver {
    my($pkg, @versions) = @_;
    my($v);

    if (&same_major(@versions)) {
	$v = &highest_version(@versions);
    } else {
	if (defined($selected_major{$pkg})) {
	    $v = &best_major($selected_major{$pkg}, @versions);
	} else {
	    $v = &manual_select($pkg, @versions);
	    $selected_major{$pkg} = &dewey_major($v);
	}
    }
    return $v;
}

sub fixup_re_add {
    my($rep, $misref, $delref) = @_;
    my(@fl) = ();
    my(@pl) = ();
    my(%missing, %deleted);

    foreach my $pkg (@{$misref}) {
	$missing{$pkg} = 1;
    }
    foreach my $pkg (@{$delref}) {
	$deleted{$pkg} = 1;
    }

    &read_selections();

    open(IN, "/tmp/re-add.sh") || die "Could not open /tmp/re-add.sh: $!";
    while(<IN>) {
	chomp; split;
	if (! /pkg_info/) { next; }
	my $package = $_[7];
	$package =~ s/.tgz$//;
	my($pkg, $ver) = &pkgversion($package);
	if ($ver =~ /,/) {
	    $ver = &choose_ver($pkg, split(/,/, $ver));
	}
	my $k = $pkg . "-" . $ver;
	if (! defined($missing{$pkg})) {
	    push(@pl, $pkg);
	    push(@fl, $pkg . "-" . $ver . ".tgz");
	}
    }
    close(IN);

    &save_selections();

    open(OUT, ">/tmp/re-add.sh") ||
	die "Could not open /tmp/re-add.sh for write: $!";
    printf(OUT "PKG_PATH=%s\n" .
	   "export PKG_PATH\n", $new_pkgs);

    # We put priority on having packages installed and not break the
    # installation.
    #
    # There may be considerable lag between a vulnerability being
    # discovered and published, and when a new binary package is
    # available, particularly since bulk builds currently have to be
    # done serially on a single CPU on a single host.

    printf(OUT "ALLOW_VULNERABLE_PACKAGES=yes\n");
    printf(OUT "export ALLOW_VULNERABLE_PACKAGES\n");

    for (my $i = 0; $i <= $#fl; $i++) {
	printf(OUT "( pkg_info -qe %s || ( pkg_add %s ) ) &&\n",
	       $pl[$i], $fl[$i]);
    }
    printf(OUT "PKG_PATH=%s\nexport PKG_PATH\n", $old_pkgs);
    foreach my $pkg (@{$misref}) {
	if (defined($deleted{$pkg})) {
	    printf(OUT "( pkg_info -qe %s || ( pkg_add %s-%s.tgz ) ) &&\n",
		   $pkg, $pkg, $installed{$pkg});
	}
    }
    printf(OUT "true\n");
    close(OUT);
    return @fl;
}

sub fetch_file {
    my($rep, $f) = @_;

    system(sprintf("ftp %s/%s", $rep, $f)) == 0 ||
	die "Could not fetch $rep/$f: $!";
}

# Given a list of files representing new files, fetch the dist files
# including their dependencies.  Yes, this duplicates some of what
# pkg_add can do, but gives us (hopefully) better error detection,
# and we don't start deleting packages before all the new dist files
# have been transferred to local disk.

sub fetch_all_pkgs {
    my($rep, @files) = @_;
    my(%fetched);

    chdir $new_pkgs || die "Could not chdir to $new_pkgs : $!";

    while ($f = shift(@files)) {
	if (! defined( $fetched{$f})) {
	    &fetch_file($rep, $f);
	    $fetched{$f} = 1;

	    $f =~ s/.tgz$//;
	    my($pkg, $ver) = &pkgversion($f);
	    foreach my $package (@{$rep_builddep{$f}}) {
		push(@files, $package . ".tgz");
	    }
	}
    }
}

#
# Main
#

&getopts("Nnr:", \%opt);

if (! defined($opt{r})) {
    if (defined($ENV{"PKG_PATH"})) {
	$repository = $ENV{"PKG_PATH"};
    } else {
	die "$0: please either specify pkg repository via -r or PKG_PATH";
    }
} else {
    $repository = $opt{r};
}

if (defined($ENV{"PKG_PATH"})) {
    $oldpath = $ENV{"PKG_PATH"};
    delete $ENV{"PKG_PATH"};
}


$new_pkgs = "/usr/pkgsrc/packages/new";
$old_pkgs = "/usr/pkgsrc/packages/old";

#
# Ensure prerequisites are in place
#

if (! -d $new_pkgs) {
    if (! mkdir($new_pkgs)) {
	die "Could not create $new_pkgs : $!";
    }
}
if (! -d "/usr/pkgsrc/packages/old") {
    if (! mkdir($old_pkgs)) {
	die "Could not create $old_pkgs : $!";
    }
}

if (system("pkg_info -qe audit-packages")) {
    die "audit-packages package not installed";
}
if (system("pkg_info -qe pkg_tarup")) {
    die "pkg_tarup package not installed";
}
if (system("pkg_info -qe pkgdepgraph")) {
    die "pkgdepgraph package not installed";
}


&get_installed_packages();
&get_repository_packages($repository);

&read_vulns("/usr/pkgsrc/distfiles/pkg-vulnerabilities");

# Make this optional?  This always picks new packages when available
open(OUT, ">/tmp/pkgdepgraph.in") ||
    die "Could not open /tmp/pkgdepgraph.in: $!";
&print_mismatches(OUT, \@missing, \@mismaches);
close(OUT);
#

system("audit-packages >> /tmp/pkgdepgraph.in") == 0 ||
    die "Could not run audit-packages: $?";

if (!defined($opt{n})) {
    printf(STDERR "Saving installed but not-in-repository packages.\n");
    &save_installed_pkgs(@missing);
}

system("pkgdepgraph -D /tmp/pkgdepgraph.in > /tmp/delete_order") == 0 ||
    die "Could not run pkgdepgraph to compute delete list: $!";

&read_delete_list(\@to_delete);

if (!defined($opt{n})) {
    printf(STDERR "Saving packages to be deleted/upgraded.\n");
    &save_installed_pkgs(@to_delete);
}

$ENV{"PKG_PATH"} = $repository;

system("pkgdepgraph -R -A /tmp/pkgdepgraph.in > /tmp/re-add.sh") == 0 ||
    die "Could not run pkgdepgraph to compute re-add order: $!";

$oldpath = $ENV{"PKG_PATH"};
delete $ENV{"PKG_PATH"};

@files_to_add = &fixup_re_add($repository, \@missing, \@to_delete);

if (!defined($opt{N})) {
    &fetch_all_pkgs($repository, @files_to_add);
}


exit(0);

# Not executed for now, to prevent typos causing an accident...


# Perform consistency checks of all the dependencies for the set of
# new selected packages, so that you don't have conflicts and end
# up with a messed-up package installation?

if (!defined($opt{n})) {
    system("cat /tmp/delete_order | xargs pkg_delete") == 0 ||
	die "Could not delete packages: $!";

    system("sh -x /tmp/re-add.sh") == 0 ||
	die "Could not add new packages";
}

----Next_Part(Fri_Feb__3_00_26_32_2006_605)----