tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Status of revivesa



On Thu, Sep 25, 2008 at 08:34:11PM +0200, Manuel Bouyer wrote:
> On Thu, Sep 25, 2008 at 11:26:13AM -0700, Bill Stouder-Studenmund wrote:
> > On Thu, Sep 25, 2008 at 10:57:57AM -0700, Bill Stouder-Studenmund wrote:
> > > On Thu, Sep 25, 2008 at 04:58:46PM +0200, Manuel Bouyer wrote:
> > > > # /usr/sbin/named -v
> > > > BIND 9.3.0
> > > > # /usr/sbin/named   
> > > > # uvm_fault(0xc5877b60, 0, 1) -> 0xe
> > > > fatal page fault in supervisor mode
> > > > trap type 6 code 0 eip c02d60df cs 9 eflags 10246 cr2 0 ilevel 7
> > > > kernel: supervisor trap page fault, code=0
> > > > Stopped in pid 12.2 (named) at  netbsd:sa_getcachelwp+0x2f:     movl    
> > > > 0xc4(%ebx),%edx
> > > > db> tr
> > > > sa_getcachelwp(c5872588,c66e1f80,c65e3b5c,c02cb957,c66e1f82,0,c5872588,c03a4fa7,c66e1f80,0)
> > > >  at netbsd:sa_getcachelwp+0x2f
> > > 
> > > Ok, I'm confused. How was the kernel compiled? I'm guessing not 
> > > DIAGNOSTIC... There are asserts that should make it clear as to what may 
> > > be wrong here.
> > > 
> > > > sa_switch(c6611c40,c0427797,1,c5864fa8,c04adcfc,c6611c40,c65e3bdc,c02b5ebf,0,0)
> > > >  a t netbsd:sa_switch+0x2a6
> > > > sleepq_block(0,0,c0427797,c0449770,c58725ac,0,c65e3bfc,0,c04add74,c5872588)
> > > >  at netbsd:sleepq_block+0x138
> > > > cv_wait(c58725ac,c5864f3c,0,c03a4fa7,c6611c40,ffffffff,0,c03a4fa7,c6611dd0,c6611c40)
> > > >  at netbsd:cv_wait+0xef
> > > > sigexit(c6611c40,b,b,0,c06f6f90,0,c6611dc0,0,0,0) at 
> > > > netbsd:sigexit+0x17e
> > > 
> > > Looks like there's an issue here too. This is a silly time to be firing 
> > > off a "BLOCKED" upcall.
> > 
> > Yes, it is a silly time. Turns out that this code:
> > 
> >         if (p->p_flag & P_WEXIT) {
> >                 mi_switch(l, NULL);
> >                 return;
> >         }
> > 
> > didn't get translated to -current right. Should be fixed in kern_sa.c rev 
> > 1.91.2.41.

sorry, 1.91.2.42.

> Sorry, I still get the panic:
> trap type 6 code 0 eip c02d60df cs 9 eflags 10246 cr2 0 ilevel 7
> kernel: supervisor trap page fault, code=0
> Stopped in pid 6.2 (named) at   netbsd:sa_getcachelwp+0x2f:     movl    
> 0xc4(%ebx),%edx
> db> tr
> sa_getcachelwp(c5872760,c6618f80,c5c9fb5c,c02cb957,c6618f82,0,c5872760,c03a4fa7,c6618f80,0)
>  at netbsd:sa_getcachelwp+0x2f
> sa_switch(c586a280,c0427797,1,c5864fa8,c04add14,c586a280,c5c9fbdc,c02b5ebf,0,0)
>  at netbsd:sa_switch+0x298
> sleepq_block(0,0,c0427797,c0449770,c5872784,0,c5865e84,0,c04adcb4,c5872760) 
> at netbsd:sleepq_block+0x138
> cv_wait(c5872784,c5864f48,0,c03a4fa7,c586a280,ffffffff,0,c03a4fa7,c586a410,c586a280)
>  at netbsd:cv_wait+0xef
> sigexit(c586a280,b,b,0,c06f7f90,0,c586a400,0,0,0) at netbsd:sigexit+0x17e
> postsig(b,c5c9fd00,c5c9fcac,c0399293,c661200c,b,c6617fa0,c5872760,c586a280,c5c9fd30)
>  at netbsd:postsig+0xfd
> lwp_userret(c586a280,c5c9fd00,c5c9fda0,2,fffffffc,1,c5c9fd2c,b93ffcd8,4,0) at 
> netbsd:lwp_userret+0x168
> trap() at netbsd:trap+0x4c8
> 
> lavardin:/home/bouyer>ident netbsd-XENU |grep kern_sa.c
>      $NetBSD: kern_sa.c,v 1.91.2.42 2008/09/25 18:24:20 wrstuden Exp $

Ok. We now set PS_WEXIT a lot later than we used to. Before, we'd set 
P_WEXIT (the old value for this flag) in sigexit(). Now we don't.

Also, the thread that decides to core the whole process will not have 
LW_WEXIT set on itself. So we have to look at PS_WCORE.

Please try rev 1.91.2.43.

If this doesn't work, I'll need to try this tonight.

Take care,

Bill

Attachment: pgp1yxC89oUza.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index