Sarton O'Brien wrote:
On Tue, 5 Aug 2008 03:52:03 pm Paul Goyette wrote:I've just updated my systems from 4.99.67 to 4.99.72 (sources current as of about 36 hours ago) and so far I've had at least half a dozen random reboots. In all cases there are no crash dumps. (Well, that's not entirely true: once I caught it dumping to disk, but savecore didn't find anything on the reboot.)This has happened at least four or five times on a very lightly-loaded dual-CPU amd64 box. This box does nothing except NAT and routing between the internal and external network. It has also happened just now for the second time on a fairly heavily loaded quad-CPU amd64 box, which is my NFS server and build machine.I don't have serial console, and all boxes generally run X, so it's not very likely that I'll actually catch it in the act.Is anyone else seeing anything strange with -current? Any hints on how to debug this?Other than pkg_info dumping core .... nothing :)
I figured out what this was. It seems to be a pkgsrc issue. When I upgraded, I didn't plan out the package replacement, I just figured I'd wing it and see what I run into. Doing a make update on perl somehow corrupted the pkg db which in turn was making pkg_info dump core at libc strnlen or mergesort.
The corruption was package specific (probably perl specific), I could remove one thing and have it progress only to trip on another corrupt package entry.
I always pkg_chk -g beforehand so it was trivial to pkg_delete -ff '*-*' and reinstall everything (I also keep prebuilt packages on hand).
I don't know if this is within the scope of what is suppose to happen as the threading change is pretty major so I wasn't awefully surprised.
On one system, the coruption was so bad I had to manually blow away the pkg db by hand.
Anyway, thought it best to mention as initially I was chasing various libc revisions until I found I still had the problem from a snapshot dated last year.
Looks like it might be worthwhile removing perl ... or all packages before updating ... unless there is a method for removing packages that may corrupt the db when removing their associated packages? That barely makes sense, so I doubt it :)
Sarton