Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Random reboots on 4.99.72



Sarton O'Brien wrote:
On Tue, 5 Aug 2008 03:52:03 pm Paul Goyette wrote:
I've just updated my systems from 4.99.67 to 4.99.72 (sources current as of about 36 hours ago) and so far I've had at least half a dozen random reboots. In all cases there are no crash dumps. (Well, that's not entirely true: once I caught it dumping to disk, but savecore didn't find anything on the reboot.)

This has happened at least four or five times on a very lightly-loaded dual-CPU amd64 box. This box does nothing except NAT and routing between the internal and external network. It has also happened just now for the second time on a fairly heavily loaded quad-CPU amd64 box, which is my NFS server and build machine.

I don't have serial console, and all boxes generally run X, so it's not very likely that I'll actually catch it in the act.

Is anyone else seeing anything strange with -current? Any hints on how to debug this?

Other than pkg_info dumping core .... nothing :)

I figured out what this was. It seems to be a pkgsrc issue. When I upgraded, I didn't plan out the package replacement, I just figured I'd wing it and see what I run into. Doing a make update on perl somehow corrupted the pkg db which in turn was making pkg_info dump core at libc strnlen or mergesort.

The corruption was package specific (probably perl specific), I could remove one thing and have it progress only to trip on another corrupt package entry.

I always pkg_chk -g beforehand so it was trivial to pkg_delete -ff '*-*' and reinstall everything (I also keep prebuilt packages on hand).

I don't know if this is within the scope of what is suppose to happen as the threading change is pretty major so I wasn't awefully surprised.

On one system, the coruption was so bad I had to manually blow away the pkg db by hand.

Anyway, thought it best to mention as initially I was chasing various libc revisions until I found I still had the problem from a snapshot dated last year.

Looks like it might be worthwhile removing perl ... or all packages before updating ... unless there is a method for removing packages that may corrupt the db when removing their associated packages? That barely makes sense, so I doubt it :)


Sarton



Home | Main Index | Thread Index | Old Index