Subject: Re: Problems with NetBSD-current kernels after 2005-10-04
To: Patrick Welche <prlw1@newn.cam.ac.uk>
From: Patrick Welche <prlw1@newn.cam.ac.uk>
List: current-users
Date: 10/22/2005 20:18:07
On Wed, Oct 19, 2005 at 05:32:44PM +0100, Patrick Welche wrote:
> On Sat, Oct 15, 2005 at 02:24:32PM -0700, Chuck Silvers wrote:
> > On Thu, Oct 13, 2005 at 05:47:10PM +0200, Klaus Klein wrote:
> > > Chuck Silvers wrote:
> > > > I've been able to reproduce the problem myself now, though it's very
> > > > inconsistent.  once it happened within a minute of starting the stress test,
> > > > but mostly it can run for hours without a problem.  I'll run the series of
> > > > tests that I asked for earlier and see which change is responsible.
> > > 
> > > As another data point, my system runs stable with the pool_cache change
> > > in but the SA change reverted.
> > 
> > actually the bug was in the pool_cache change, I just checked in a fix.
> 
> Weird: I just tried a yesterday kernel and had a freeze. Stupidly I also
> upgraded the userland, so can't easily go back to my old working
> 12 Sept kernel. Once again no info :-( I must set up a serial console..

The info says - it's named, so not what you have been seeing:

Oct 22 19:23:02 quartz syslogd: Exiting on signal 15
uvm_fault(0xc0537b80, 0xdeadb000, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 317.1 (named) at 0xdeadbeef:     invalid address
db{1}> bt
acpi_softc(cdad564c,c32bf800,cdb13f9c,c02d5d8b,0) at 0xdeadbeef
sa_switchcall(cdad564c,2b,2b,2b,2b) at netbsd:sa_switchcall+0x44
db{1}> sync
syncing disks... panic: TLB IPI rendezvous failed (mask 1)
Stopped in pid 317.1 (named) at netbsd:breakpoint+0x4:  leave
db{1}> sync

dump to dev 4,1 not possible
rebooting...

This happened with today's cvs, while shutting down.  I remember
restarting named when I had the freeze mentioned in the original
email, but also had a build going at the same time, so it wasn't
obviously the named restart...

Cheers,

Patrick