Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[pkgsrc/trunk]: pkgsrc/mk mk: Rewrite the checksum script in awk.



details:   https://anonhg.NetBSD.org/pkgsrc/rev/c1c61594b1b7
branches:  trunk
changeset: 437590:c1c61594b1b7
user:      jperkin <jperkin%pkgsrc.org@localhost>
date:      Thu Aug 27 11:45:45 2020 +0000

description:
mk: Rewrite the checksum script in awk.

The previous shell script version's runtime was quadratic against the
number of distfiles to verify.  Historically this has not been an issue,
with usually only a handful of files per package.  However, with the
introduction of Go modules the number of distfiles used by a single
package can be very high.

For example, in an upcoming update of www/grafana to version 7.1.5, the
number of GO_MODULE_FILES is 821.  Running 'bmake checksum' takes:

  real    18m20.743s
  user    17m27.975s
  sys     0m49.239s

With the awk code, this is reduced to a far more sensible:

  real    0m4.330s
  user    0m3.241s
  sys     0m0.875s

The script has been written to emulate the previous version precisely,
preserving the same output and error messages and supporting all of its
behaviour, with the one exception that previous exit values of 128 have
been changed to 3, in order to avoid any potential signed 8-bit issues.

The one change in the pkgsrc infrastructure is that the mk/fetch/fetch
script no longer sets a working default value for ${CHECKSUM}.  This is
not a problem in a pkgsrc environment as all of the required variables
are set correctly, but if there happen to be any users who are using
this script in a standalone environment, they will need to set it
accordingly.  This was probably required in many situations previously
anyway, as none of the script's environment variables were set, and
trying to support this would be fragile at best.

diffstat:

 mk/checksum/checksum     |  191 -----------------------------
 mk/checksum/checksum.awk |  308 +++++++++++++++++++++++++++++++++++++++++++++++
 mk/checksum/checksum.mk  |   10 +-
 mk/fetch/fetch           |    4 +-
 4 files changed, 315 insertions(+), 198 deletions(-)

diffs (truncated from 551 to 300 lines):

diff -r ec2b54d01fc0 -r c1c61594b1b7 mk/checksum/checksum
--- a/mk/checksum/checksum      Thu Aug 27 11:25:29 2020 +0000
+++ /dev/null   Thu Jan 01 00:00:00 1970 +0000
@@ -1,191 +0,0 @@
-#!/bin/sh
-#
-# $NetBSD: checksum,v 1.16 2018/08/22 20:48:36 maya Exp $
-#
-# Copyright (c) 2006, 2007 The NetBSD Foundation, Inc.
-# All rights reserved.
-#
-# This code is derived from software contributed to The NetBSD Foundation
-# by Johnny C. Lam.
-#
-# Redistribution and use in source and binary forms, with or without
-# modification, are permitted provided that the following conditions
-# are met:
-# 1. Redistributions of source code must retain the above copyright
-#    notice, this list of conditions and the following disclaimer.
-# 2. Redistributions in binary form must reproduce the above copyright
-#    notice, this list of conditions and the following disclaimer in the
-#    documentation and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
-# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
-# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
-# BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-# POSSIBILITY OF SUCH DAMAGE.
-#
-
-######################################################################
-#
-# NAME
-#      checksum -- checksum files
-#
-# SYNOPSIS
-#      checksum [options] distinfo [file ...]
-#
-# DESCRIPTION
-#      checksum will verify the checksums in the distinfo file for each
-#      of the files specified.
-#
-#      The checksum utility exits with one of the following values:
-#
-#      0       All of the file checksums verify.
-#
-#      1       At least one of the file checksums did not match.
-#
-#      2       At least one of the files is missing any checksum.
-#
-#      >2      An error occurred.
-#
-# OPTIONS
-#      -a algorithm    Only verify checksums for the specified algorithm.
-#
-#      -p              The specified files are patches, so strip out any
-#                      lines containing NetBSD RCS ID tags before
-#                      computing the checksums for verification.
-#
-#      -s suffix       Strip the specified suffix from the file names
-#                      when searching for the checksum.
-#
-######################################################################
-
-set -e         # exit on errors
-
-: ${DIGEST:=digest}
-: ${CAT:=cat}
-: ${ECHO:=echo}
-: ${SED:=sed}
-: ${TEST:=test}
-
-self="${0##*/}"
-
-usage() {
-       ${ECHO} 1>&2 "usage: $self [-a algorithm] [-p] [-s suffix] distinfo [file ...]"
-}
-
-# Process optional arguments
-algorithm=
-patch=
-suffix=
-while ${TEST} $# -gt 0; do
-       case "$1" in
-       -a)     algorithm="$2"; shift 2 ;;
-       -p)     patch=yes; shift ;;
-       -s)     suffix="$2"; shift 2 ;;
-       --)     shift; break ;;
-       -*)     ${ECHO} 1>&2 "$self: unknown option -- ${1#-}"
-               usage
-               exit 128
-               ;;
-       *)      break ;;
-       esac
-done
-
-# Process required arguments
-${TEST} $# -gt 0 || { usage; exit 128; }
-distinfo="$1"; shift
-files="$@"
-files_left="$@"
-
-if ${TEST} ! -f "$distinfo"; then
-       ${ECHO} 1>&2 "$self: distinfo file missing: $distinfo"
-       exit 128
-fi
-
-digestcmd=
-case "${DIGEST}" in
-/*)
-       if ${TEST} -x "${DIGEST}"; then
-               digestcmd="${DIGEST}"
-       fi
-       ;;
-*)
-       SAVEIFS="$IFS"; IFS=:
-       for i in $PATH; do
-               if ${TEST} -x "$i/${DIGEST}"; then
-                       digestcmd="$i/${DIGEST}"
-                       break
-               fi
-       done
-       IFS="$SAVEIFS"
-       ;;
-esac
-
-if ${TEST} -z "$digestcmd"; then
-       ${ECHO} 1>&2 "$self: \`\`${DIGEST}'' is missing"
-       exit 128
-fi
-
-{ exitcode=0
-  while read d_alg d_file d_equals d_checksum; do
-       case "$d_alg" in
-       "#"*)   continue ;;     # skip comments
-       "\$"*)  continue ;;     # skip RCS ID
-       "")     continue ;;     # skip empty lines
-       Size)   continue ;;     # skip lines holding filesizes, not checksums
-       esac
-
-       if ${TEST} -n "$algorithm"; then
-               ${TEST} "$d_alg" = "$algorithm" || continue
-       fi
-
-       for file in $files; do
-               sfile="${file%$suffix}"
-               ${TEST} -z "$patch" || sfile="${sfile##*/}"
-               ${TEST} "$d_file" = "($sfile)" || continue
-
-               new_files_left=
-               for file_left in $files_left; do
-                       ${TEST} "${file_left}" = "${file}" || \
-                           new_files_left="${new_files_left} ${file_left}"
-               done
-               files_left="${new_files_left}"
-
-               if ${TEST} "$d_checksum" = "IGNORE"; then
-                       ${ECHO} 1>&2 "$self: Ignoring checksum for $sfile"
-                       continue
-               fi
-               if ${TEST} ! -f $file; then
-                       ${ECHO} 1>&2 "$self: $file does not exist"
-                       exit 128
-               fi
-               if ${TEST} -z "$patch"; then
-                       checksum=`${DIGEST} $d_alg < $file`
-               else
-                       checksum=`${SED} -e '/[$]NetBSD.*/d' $file | ${DIGEST} $d_alg`
-               fi
-               if ${TEST} "$d_checksum" = "$checksum"; then
-                       ${ECHO} "=> Checksum $d_alg OK for $sfile"
-               else
-                       ${ECHO} 1>&2 "$self: Checksum $d_alg mismatch for $sfile"
-                       exit 1
-               fi
-               break
-       done
-  done
-  if ${TEST} -n "$files_left"; then
-       for file in $files_left; do
-               if ${TEST} -n "$algorithm"; then
-                       ${ECHO} 1>&2 "$self: No $algorithm checksum recorded for $file"
-               else
-                       ${ECHO} 1>&2 "$self: No checksum recorded for $file"
-               fi
-               exitcode=2
-       done
-  fi
-  exit $exitcode; } < $distinfo
diff -r ec2b54d01fc0 -r c1c61594b1b7 mk/checksum/checksum.awk
--- /dev/null   Thu Jan 01 00:00:00 1970 +0000
+++ b/mk/checksum/checksum.awk  Thu Aug 27 11:45:45 2020 +0000
@@ -0,0 +1,308 @@
+#!/usr/bin/awk -f
+#
+# $NetBSD: checksum.awk,v 1.1 2020/08/27 11:45:45 jperkin Exp $
+#
+###########################################################################
+#
+# NAME
+#      checksum.awk -- checksum files
+#
+# SYNOPSIS
+#      checksum.awk [options] distinfo [file ...]
+#
+# DESCRIPTION
+#      checksum will verify the checksums in the distinfo file for each
+#      of the files specified.
+#
+#      The checksum utility exits with one of the following values:
+#
+#      0       All of the file checksums verify.
+#
+#      1       At least one of the file checksums did not match.
+#
+#      2       At least one of the files is missing any checksum.
+#
+#      >2      An error occurred.
+#
+# OPTIONS
+#      -a algorithm    Only verify checksums for the specified algorithm.
+#
+#      -p              The specified files are patches, so strip out any
+#                      lines containing NetBSD RCS ID tags before
+#                      computing the checksums for verification.
+#
+#      -s suffix       Strip the specified suffix from the file names
+#                      when searching for the checksum.
+#
+# BUGS
+#      The flow of this program is not performed in the most optimal way
+#      possible, as it was deemed important to retain output compatibility
+#      with the previous shell script implementation.
+#
+
+BEGIN {
+       DIGEST = ENVIRON["DIGEST"] ? ENVIRON["DIGEST"] : "digest"
+       SED = ENVIRON["SED"] ? ENVIRON["SED"] : "sed"
+
+       # Retain output compatible with previous "checksum" shell script
+       progname = "checksum"
+
+       only_alg = ""
+       distinfo = ""
+       exitcode = 0
+       patch = 0
+       suffix = ""
+
+       for (arg = 1; arg < ARGC; arg++) {
+               opt = ARGV[arg]
+               if (opt == "-a") {
+                       only_alg = ARGV[++arg]
+               } else if (opt == "-p") {
+                       patch = 1
+               } else if (opt == "-s") {
+                       suffix = ARGV[++arg]
+               } else if (opt == "--") {
+                       arg++
+                       break
+               } else if (match(opt, /^-.*/) != 0) {
+                       opt = substr(opt, RSTART + 1, RLENGTH)
+                       err(sprintf("%s: unknown option -- %s", progname, opt))
+                       usage()
+                       exit 3
+               } else {
+                       break
+               }
+       }
+
+       if (arg >= ARGC) {
+               usage()
+               exit 3
+       }
+
+       distinfo = ARGV[arg++]
+       cmd = sprintf("test -f %s", distinfo)
+       if (system(cmd) != 0) {
+               err(sprintf("%s: distinfo file missing: %s", progname,
+                   distinfo))
+               exit 3
+       }
+
+       #
+       # Initialise list of files to check, passed on the command line.  In
+       # order to keep things simple, distfiles[] is also used when operating
+       # in patch mode (-p).
+       #
+       while (arg < ARGC) {
+               distfile = ARGV[arg++]
+               sfile = distfile
+               if (suffix) {
+                       sfile = strip_suffix(sfile)
+               }
+               if (patch) {



Home | Main Index | Thread Index | Old Index