Subject: pkg_create dumping core (NetBSD-current)
To: None <tech-pkg@netbsd.org>
From: Robert Elz <kre@munnari.OZ.AU>
List: tech-pkg
Date: 04/27/2001 02:53:06
I'm not sure I believe this, but pkg_create has dumped core on
me on two packages in the past couple of days (just two).
That is, automake-1.4 and pth-1.4.0

I have installed lots of bigger (and some smaller) packages than
those two, no problems.  Those two repeatably fail exactly the same
way.

Now it also happens that those are the only two xxx-1.4 packages
I have attempted to install ... so just to disprove the ludicrous
possibility that it was the "1.4" that was causing problems, I
picked a small useless package (happened to be xteddy...) made its
version be 1.4 and installed that, no problem.   (Deleted it again too).

I will keep debugging this tomorrow (later today) unless by some chance
someone who knows more about what is going on here fixes things while
I am snoozing...

I might be possible that there's some oddity in automake and pth which
is causing problems, but while fixing them might be worthwhile (if so)
it isn't the answer, pkg_create shouldn't be dumping core anyway.

The core dumps come deep inside the btree routines from libc ...

#0  0x80521ec in __bt_cmp (t=0x8060400, k1=0xbfbfc8a0, e=0x8060408)
    at bt_utils.c:192
#1  0x804fcef in __bt_search (t=0x8060400, key=0xbfbfc8a0, exactp=0xbfbfc86c)
    at bt_search.c:93
#2  0x804ddfd in __bt_get (dbp=0x805e1c0, key=0xbfbfc8a0, data=0xbfbfc898, 
    flags=0) at bt_get.c:95
#3  0x80532cc in pkgdb_retrieve (
    key=0xbfbfc8b4 "/usr/pkg/share/aclocal/maintainer.m4") at pkgdb.c:139
#4  0x804b79b in check_list (home=0x0, pkg=0xbfbfd170, 
    PkgName=0x805e01d "automake-1.4") at pl.c:154
#5  0x804aeed in pkg_perform (pkgs=0xbfbfd198) at perform.c:299
#6  0x804a446 in main (argc=1, argv=0xbfbfd250) at main.c:198
#7  0x8049e61 in ___start ()

Line 192 of bt_utils.c is ...

		bi = GETBINTERNAL(h, e->index);

where GETBINTERNAL() is ...

#define	GETBINTERNAL(pg, indx)						\
	((BINTERNAL *)(void *)((char *)(void *)(pg) + (pg)->linp[indx]))

and the dump happens, because h is ...

(gdb) print *h
$6 = {pgno = 0, prevpg = 0, nextpg = 0, flags = 0, lower = 0, upper = 0, 
  linp = {0}}

or more particularly, because h->linp is NULL.    e->index is 32763, and
e->page == h (just in case that matters).

When the core happens, pkg_create (or something) has made
/var/db/pkg/automate-1.4  and placed an empty +CONTENTS file in there.
Nothing else.   Fortunately, this is enough for other stuff to see
the prerequisite as existing (which is the only reason I'd have automake
installed in the first place) though there do get to be a number of warnings
about unknown binary version when installing other stuff (they seem
harmless enough).

That's as far as I have gone so far.

All this is on an i386 with kernel sources cvs'd a few days ago, and
most userland from the latest (1.5T) snapshot.   These core dumps happened
with the pkg_create in that snapshot, and with the latest version installed
from pkgsrc (updated via cvs yesterday).  I think they are the same.   The
btree routines from libc I have been debugging also came from yesterday's
cvs update (not that I think any of that stuff changed recently).  Core drops
in the pkgsrc directory (ie: devel/automake etc) if that is relevant.  Cause
is SIGSEGV from the indirect through 0.

kre