Subject: Re: muhah
To: Trevor Johnson <trevor@jpj.net>
From: Alistair Crooks <agc@pkgsrc.org>
List: tech-pkg
Date: 03/27/2001 10:33:33
On Mon, Mar 26, 2001 at 02:27:54PM -0500, Trevor Johnson wrote:
> > > system on NetBSD, Solaris, or most Linux distributions), pkgsrc doesn't
> > > bootstrap one into place.  If the user installs the wrong C compiler--a
> > > buggy, old, or trojaned one (food for thought:
> > > http://www.acm.org/classics/sep95/)--he loses.
> >
> > If the user hasn't installed a C compiler, then they're going to have
> > an interesting but unrewarding time building applications from source
> > using pkgsrc.
> 
> Yes.
> 
> > > > > > 2.  openssl produces output in a slightly different format from
> > > > > > md5(1).  I really don't want to have to pre-process everything with
> > > > > > sed or awk or expr.
> > > > >
> > > > > Can't the existing md5 utility still handle the MD5 hashes which were
> > > > > generated with it?  For the SHA-1 and RIPEMD-160 hashes, would you
> > > > > consider making the output of your digest utility have the same format as
> > > > > the output from OpenSSL?
> > > >
> > > > It's not a huge problem, it's a minor niggle. Yes, we can massage output
> > > > with sed, awk or expr. But I don't want to do that.
> > >
> > > You misunderstand.  What I requested is output from "digest sha1 foo" in
> > > the format that "openssl dgst -sha1 foo" has, and likewise for "digest
> > > rmd160 foo" to have the same format as "openssl dgst -rmd160".  That way,
> > > if it ever becomes desirable to use OpenSSL for hashing--for instance, in
> > > a future world where pre-1999 versions of NetBSD needn't to be fully
> > > supported--such massaging will not be necessary.  I've appended a trivial
> > > patch which does this.  For the SHA-1 and RIPEMD-160 hashes, the slightly
> > > different output is unnecessary.  MD5 hashes have already been calculated,
> > > so I don't propose changing them.
> >
> > Sorry about the misunderstanding.  But when there are over 3000 files
> > which have been prepared using a standard NetBSD utility called
> > md5(1), it's a bit of an onerous job to have to reformat those 3000+
> > files, just because openssl outputs in a different format.  Yes, we
> > can massage the output by using sed, awk or expr, but, frankly, I
> > don't want to.
> 
> That's quite reasonable.  That's why I didn't propose changing the MD5s.

No, but you propose changing the output format for a utility which is
used to calculate checksums, which means that we'd have to change the
format of all these files. Oh, no, I've got it wrong - you want a different
format for md5 and sha1 and rmd160/ripemd160, simply because openssl uses
a different format, and it's not exactly guaranteed that your silky words
have persuaded me to drop everything and run with openssl. OK, to line
1290ish or thereabouts in bsd.pkg.mk, where distfile checksums are being
calculated, with FAILOVER_FETCH set, to see if the distfile is to be trusted
or not:

                                if [ -n "${FAILOVER_FETCH}" -a -f ${DIGEST_FILE} -a -f ${_DISTDIR}/$$bfile ]; then \
                                        alg=`${AWK} 'NF == 4 && $$2 == "('$$file')" && $$3 == "=" {print $$1;}' ${DIGEST_FILE}`; \
                                        if [ -z "$$alg" ]; then         \
                                                alg=${DIGEST_ALGORITHM};\
                                        fi;                             \
                                        CKSUM=`${DIGEST} $$alg < ${_DISTDIR}/$$bfile`; \
                                        CKSUM2=`${AWK} '$$1 == "'$$alg'" && $$2 == "('$$file')" {print $$4;}' <${DIGEST_FILE}`; \
                                        if [ "$$CKSUM" = "$$CKSUM2" -o "$$CKSUM2" = "IGNORE" ]; then \
                                                continue 2;             \
                                        else                            \
                                                ${ECHO_MSG} "=> Checksum failure - trying next site."; \
                                        fi;                             \

That now becomes:

....

well, what does it become, exactly?  Because I'm not going to bother
making changes until you've persuaded me that that's the right way to
go.

When you've done modifying that part, please do the other 4 places in
bsd.pkg.mk where checksums are calculated.

 
> > Have a look at the logic used to identify checksums
> > and checksum algorithms in bsd.pkg.mk, and you'll see why I don't want
> > yet another format to parse.  I'm sorry, but I thought I'd put that in
> > my original mail.
> 
> Well, if y
> 
> Your digest utility introduced two new formats.  If my patch were
> incorporated in it, then there would be five formats altogether.  However,
> the number of formats could be reduced back to three by reformatting the
> few "md5" files that have been generated with your digest utility.

My digest utility has one output format, compatible with the old
md5(1) utility.  You want me to make sha1 and rmd160/ripemd160 digests
different in output format (although you have still to convince me
exactly why this would be a good reason), and we then have to add your
custom logic to bsd.pkg.mk (see above).
 
> > > > > > 3.  I want a message digest calculation utility that is small and
> > > > > > quick, and something that either is present on all Operating Systems
> > > > > > on which pkgsrc runs, or is buildable on those operating systems with
> > > > > > minimum fuss. openssl does not really fit the bill here.
> > >
> > > > In the whole scheme of things, though, with all the other processing
> > > > that is taking place at that time, the speed of the code produced by
> > > > an optimising compiler vs.  hand-tuned assembly code is fairly low on
> > > > my list of priorities.
> > >
> > > Thanks for dropping the objection.
> >
> > Au contraire - I have not dropped any objection.
> >
> > And I do resent such an implication.
> >
> > My apologies, however, if you're just being disingenuous.
> 
> If I were, it would be I who would owe an apology, wouldn't it?

OK, no apology, so you weren't being disingenuous.

Isn't selective deletion wonderful?

Soon, it'll be easy enough to make me responsible for all the world's ills,
so I'll short-circuit that:

Yes, I am still beating my wife.

> > What I DID write, however, was:
> >
> > > Now I'm not a statistician, but the sample size isn't huge, I don't
> > > know what was taking place on your machine while you were taking your
> > > measurements, and, all in all, the results aren't all that conclusive,
> > > are they? I mean, to take your figures above, digest takes 65% of the
> > > time of openssl to calculate sha1 digests.
> > >
> > > OTOH, I wouldn't be that surprised if openssl was actually quicker,
> > > since openssl uses assembly code on the more popular architectures.
> > >
> > > In the whole scheme of things, though, with all the other processing
> > > that is taking place at that time, the speed of the code produced by
> > > an optimising compiler vs.  hand-tuned assembly code is fairly low on
> > > my list of priorities.
> 
> Yes, that's what you wrote.  I've quoted it all now.  As I understood your
> original statement that you wanted something "quick" and OpenSSL "does not
> fit the bill", I took it to mean that in your testing, OpenSSL was too
> slow for practical use.  Now you seem to be saying that my testing is
> statistically invalid and inconclusive, that you haven't done any
> conclusive testing of your own, OpenSSL may be faster in sum^H^Home cases,
> speed is a low priority.  Nonetheless you still object that OpenSSL is too
> slow.  Is that a fair summary?  If not, kindly rephrase.  I don't mean to
> be unfair.
> 
> If performance is still crucial for you, would you please describe what
> benchmark or criterion would satisfy you?

Compile time.
Source size.
Modifiability.
Output format.
Usability.
Decent licence.
No export funnies.

and finally, much behind the rest in priority:

xeecution speed.

What I mean by that is that I don't want a dog, in execution terms,
calculating my digest format of choice, which is sha1.  And as openssl
was, by your figures, roughly 1.66 the times of digest, I suggest you
start optimising openssl straight away, in addition to all the changes
to bsd.pkg.mk that are necessary.
 
> > Perhaps you'd like to go away
> 
> :-)
> 
> > and cut down openssl so that it's < 200K in source form, is quick to
> > compile,
> 
> For people who already have it, such as those who have it in the base
> system, these are non-issues.  Eventually, it's likely that the great
> majority of users will be in this category.

And for the rest of them?

You obviously think we should just ignore people who are running old
releases of NetBSD.  That will happen over my dead body.  I already
explained that I didn't have control over what went into Solaris or
Linux - why should they be disadvantaged, simply because you think
that openssl is the answer?
 
> > is statically linked,
> 
> Although I read your earlier statement about this ("we really need a
> statically-linked utility, if ever we're going to install a system from
> packages of any kind"), I didn't follow the reasoning.  Perhaps what you
> had in mind was a situation where the hashing utility were dynamically
> linked and the user replaced the libraries on which it depended, without
> replacing it at the same time.  Clearly that would be broken.  It's not
> clear to me that statically linking the hashing utility would be the only
> way to avoid that problem.

No, I meant statically-linked.

For example, the sha1 digest code in libc.so has alignment problems on
Alphas on NetBSD < 1.5T.
 
> > has no patches to be applied,
> 
> Is the reasoning here that if there are patches, the hashes for those
> patches must be tested, and since we're preparing the hashing utility, we
> encounter a chicken-and-egg problem?  Since this doesn't apply for people
> who have OpenSSL in the base system, I assume you're talking about people
> who install it from pkgsrc.  Checking hashes can be done to detect
> accidental file corruption, or malicious changes.  For detecting
> accidental damage, MD5s are adequate (overkill, really).  If there were
> malicious changes, a malefactor who could change the patch files could
> just as easily change the "md5" files.  Therefore the SHA-1 and RIPEMD-160
> hashes provide no extra protection to speak of.  Therefore, the md5
> utility could be used to avoid the chicken-and-egg problem.

How could the files be "just as easily change"d? This isn't another of
those "trojaned C compiler" stories, is it?

Let me get this straight, then:

Drawbacks:

- 10 MB of openssl source to be added to pkgsrc.
- Logic to be added to check what version of openssl is installed.
- Logic to be changed in bsd.pkg.mk for checksum calculation, if
	openssl installed.
- Anything up to 3000 files have to be changed from md5(1) format
	to openssl-format
- Every computer with pkgsrc on it gets openssl installed as
	part of the pkgsrc bootstrapping process
- (To say nothing of sha1 digests being slower to calculate using
	openssl)

Benefits gained:

+ Remind me again why this is such a good idea?

> > and provides md5(1)-compatible output
> 
> The whole purpose of the exercise is to replace, or at least augment, MD5
> with longer hashes.  If MD5 is abandoned, then this becomes a non-issue.

No, please read what I wrote above about why the output format is
needed to be in md5(1)-compatible.
 
> > and is distributed with a BSD licence,
> 
> Strictly speaking, it's not under a BSD license, because the copyright
> does not belong to the Regents of the University of California.  However,
> the authors' description of it as "BSD-style" seems accurate enough, given
> the mostly identical terms (IANAL). I've appended it for you.
> 
> > then we'll consider switching over to using openssl as part of pkgsrc.
> 
> Thanks!

You're welcome!

Regards,
Alistair

> -- 
> Trevor Johnson
> http://jpj.net/~trevor/gpgkey.txt
> 
> 
> 
>   LICENSE ISSUES
>   ==============
> 
>   The OpenSSL toolkit stays under a dual license, i.e. both the conditions of
>   the OpenSSL License and the original SSLeay license apply to the toolkit.
>   See below for the actual license texts. Actually both licenses are BSD-style
>   Open Source licenses. In case of any license issues related to OpenSSL
>   please contact openssl-core@openssl.org.
> 
>   OpenSSL License
>   ---------------
> 
> /* ====================================================================
>  * Copyright (c) 1998-2001 The OpenSSL Project.  All rights reserved.
>  *
>  * Redistribution and use in source and binary forms, with or without
>  * modification, are permitted provided that the following conditions
>  * are met:
>  *
>  * 1. Redistributions of source code must retain the above copyright
>  *    notice, this list of conditions and the following disclaimer.
>  *
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in
>  *    the documentation and/or other materials provided with the
>  *    distribution.
>  *
>  * 3. All advertising materials mentioning features or use of this
>  *    software must display the following acknowledgment:
>  *    "This product includes software developed by the OpenSSL Project
>  *    for use in the OpenSSL Toolkit. (http://www.openssl.org/)"
>  *
>  * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to
>  *    endorse or promote products derived from this software without
>  *    prior written permission. For written permission, please contact
>  *    openssl-core@openssl.org.
>  *
>  * 5. Products derived from this software may not be called "OpenSSL"
>  *    nor may "OpenSSL" appear in their names without prior written
>  *    permission of the OpenSSL Project.
>  *
>  * 6. Redistributions of any form whatsoever must retain the following
>  *    acknowledgment:
>  *    "This product includes software developed by the OpenSSL Project
>  *    for use in the OpenSSL Toolkit (http://www.openssl.org/)"
>  *
>  * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
>  * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE OpenSSL PROJECT OR
>  * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
>  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
>  * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
>  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
>  * OF THE POSSIBILITY OF SUCH DAMAGE.
>  * ====================================================================
>  *
>  * This product includes cryptographic software written by Eric Young
>  * (eay@cryptsoft.com).  This product includes software written by Tim
>  * Hudson (tjh@cryptsoft.com).
>  *
>  */
> 
>  Original SSLeay License
>  -----------------------
> 
> /* Copyright (C) 1995-1998 Eric Young (eay@cryptsoft.com)
>  * All rights reserved.
>  *
>  * This package is an SSL implementation written
>  * by Eric Young (eay@cryptsoft.com).
>  * The implementation was written so as to conform with Netscapes SSL.
>  *
>  * This library is free for commercial and non-commercial use as long as
>  * the following conditions are aheared to.  The following conditions
>  * apply to all code found in this distribution, be it the RC4, RSA,
>  * lhash, DES, etc., code; not just the SSL code.  The SSL documentation
>  * included with this distribution is covered by the same copyright terms
>  * except that the holder is Tim Hudson (tjh@cryptsoft.com).
>  *
>  * Copyright remains Eric Young's, and as such any Copyright notices in
>  * the code are not to be removed.
>  * If this package is used in a product, Eric Young should be given attribution
>  * as the author of the parts of the library used.
>  * This can be in the form of a textual message at program startup or
>  * in documentation (online or textual) provided with the package.
>  *
>  * Redistribution and use in source and binary forms, with or without
>  * modification, are permitted provided that the following conditions
>  * are met:
>  * 1. Redistributions of source code must retain the copyright
>  *    notice, this list of conditions and the following disclaimer.
>  * 2. Redistributions in binary form must reproduce the above copyright
>  *    notice, this list of conditions and the following disclaimer in the
>  *    documentation and/or other materials provided with the distribution.
>  * 3. All advertising materials mentioning features or use of this software
>  *    must display the following acknowledgement:
>  *    "This product includes cryptographic software written by
>  *     Eric Young (eay@cryptsoft.com)"
>  *    The word 'cryptographic' can be left out if the rouines from the library
>  *    being used are not cryptographic related :-).
>  * 4. If you include any Windows specific code (or a derivative thereof) from
>  *    the apps directory (application code) you must include an acknowledgement:
>  *    "This product includes software written by Tim Hudson (tjh@cryptsoft.com)"
>  *
>  * THIS SOFTWARE IS PROVIDED BY ERIC YOUNG ``AS IS'' AND
>  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
>  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
>  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
>  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
>  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
>  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
>  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
>  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
>  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
>  * SUCH DAMAGE.
>  *
>  * The licence and distribution terms for any publically available version or
>  * derivative of this code cannot be changed.  i.e. this code cannot simply be
>  * copied and put under another distribution licence
>  * [including the GNU Public Licence.]
>  */
>