Subject: Re: Problems with NetBSD-current kernels after 2005-10-04
To: Chuck Silvers <chuq@chuq.com>
From: Patrick Welche <prlw1@newn.cam.ac.uk>
List: current-users
Date: 10/23/2005 15:57:37
On Sat, Oct 22, 2005 at 01:54:34PM -0700, Chuck Silvers wrote:
> hi,
> 
> On Sat, Oct 22, 2005 at 08:18:07PM +0100, Patrick Welche wrote:
> > The info says - it's named, so not what you have been seeing:
> > 
> > Oct 22 19:23:02 quartz syslogd: Exiting on signal 15
> > uvm_fault(0xc0537b80, 0xdeadb000, 0, 1) -> 0xe
> > kernel: supervisor trap page fault, code=0
> > Stopped in pid 317.1 (named) at 0xdeadbeef:     invalid address
> > db{1}> bt
> > acpi_softc(cdad564c,c32bf800,cdb13f9c,c02d5d8b,0) at 0xdeadbeef
> > sa_switchcall(cdad564c,2b,2b,2b,2b) at netbsd:sa_switchcall+0x44
> > db{1}> sync
> > syncing disks... panic: TLB IPI rendezvous failed (mask 1)
> > Stopped in pid 317.1 (named) at netbsd:breakpoint+0x4:  leave
> > db{1}> sync
> > 
> > dump to dev 4,1 not possible
> > rebooting...
> > 
> > This happened with today's cvs, while shutting down.  I remember
> > restarting named when I had the freeze mentioned in the original
> > email, but also had a build going at the same time, so it wasn't
> > obviously the named restart...

I can reproduce this at will simply with "/etc/rc.d/named restart".

> ok, that does look like a bug in the sa change.  we're jumping through
> a function pointer that has 0xdeadbeef as the value.  most likely the
> sau is being freed before we try to use it, but I don't see where.
> 
> next time you see this, try "reboot 0x104" instead of "sync",
> that's more likely to succeed in getting a dump.
> 
> ...on second thought, the "not possible" message is because either
> no dump device is configured or the device is too small to hold a dump.

Bother: I tried boot -a, dump on sd1e, but got

kernel: supervisor trap page fault, code=0
Stopped in pid 337.1 (named) at 0xdeadbeef:     invalid address
db{0}> bt
acpi_softc(cda7464c,c0513100,cd81ff9c,c02d0813,0) at 0xdeadbeef
sa_switchcall(cda7464c,2b,2b,2b,2b) at netbsd:sa_switchcall+0x44
db{0}> sync
panic: TLB IPI rendezvous failed (mask 2)
Stopped in pid 337.1 (named) at netbsd:breakpoint+0x4:  leave
db{0}> sync

dump to dev 4,12 not possible
rebooting...

The dump partition doesn't *have* to be swap does it?
#        size    offset     fstype [fsize bsize cpg/sgs]
 e:   8401200   1218174     4.2BSD   1024  8192 46168  # (Cyl.    239*-   1893*)
(it wasn't mounted)

I'll try the reboot 0x104 next...

Cheers,

Patrick