Re: kern/48098: panic: kernel diagnostic assertion

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,marcotte%panix.com@localhost
Subject: Re: kern/48098: panic: kernel diagnostic assertion
From: Brian Marcotte <marcotte%panix.com@localhost>
Date: Mon, 28 Oct 2013 12:25:01 +0000 (UTC)

The following reply was made to PR kern/48098; it has been noted by GNATS.

From: Brian Marcotte <marcotte%panix.com@localhost>
To: "S.P.Zeidler" <spz%NetBSD.org@localhost>
Cc: gnats-bugs%gnats.NetBSD.org@localhost, Brian Marcotte 
<marcotte%panix.com@localhost>
Subject: Re: kern/48098: panic: kernel diagnostic assertion
Date: Mon, 28 Oct 2013 08:20:30 -0400

 > could you please test if the attached patch (kudos 
 > mlelstv%NetBSD.org@localhost)
 > helps with the tcpdrop issue? it's only lightly tested so maybe not
 > on your most precious server. :)
 
 That patch seems to make things worse. With it, I get this panic using
 tcpdrop on connections in FIN_WAIT_1 (possibly others):
 
   Mutex error: mutex_vector_enter: locking against myself
 
   lock address : 0x00000000c104ff40
   current cpu  :                  0
   current lwp  : 0x00000000c24fa540
   owner field  : 0x00000000c24fa540 wait/spin:                0/0
 
   panic: lock error
   cpu0: Begin traceback...
   
panic(c0408709,c03f8468,c03d8841,c03f8437,c104ff40,0,c24fa540,c0428ae0,c0428ae0,d8651a9c)
 at netbsd:panic+0x18
   
lockdebug_abort(c104ff40,c0428ae0,c03d8841,c03f8437,c104ff40,c24fa540,d8651acc,c01eb13b,c208fa00,ffffffff)
 at netbsd:lockdebug_abort+0x2f
   
mutex_abort(c208fa00,ffffffff,c6f53133,c257921c,c24d3744,bb018588,c12feab0,c104ff40,c24f5174,0)
 at netbsd:mutex_abort+0x32
   
mutex_vector_enter(c104ff40,3331f5c6,6fcb,ed0854a6,bb01,0,0,d8651b2c,c24d3744,2)
 at netbsd:mutex_vector_enter+0x33b
   
sysctl_net_inet_tcp_ident(d8651c94,0,0,d8651cb4,bf7fec70,100,d8651c84,c24fa540,c106b9c0,4)
 at netbsd:sysctl_net_inet_tcp_ident+0x383
   
sysctl_dispatch(d8651c84,4,0,d8651cb4,bf7fec70,100,d8651c84,c24fa540,c106b9c0,d8651cb4)
 at netbsd:sysctl_dispatch+0xc7
   
sys___sysctl(c24fa540,d8651d00,d8651d28,0,c132bbd0,ca,0,bb78b000,d8651d00,c256ba58)
 at netbsd:sys___sysctl+0xea
   syscall(d8651d48,b3,ab,bf7f001f,806001f,4,0,bf7fe3fc,bb7c65bc,0) at 
netbsd:syscall+0xaa
   cpu0: End traceback...
 
 > For the reason there are stuck connections in the first place:
 > when you have some, could you please check if an apache thread is
 > stuck either in soaccept or kauth_cred_dup, from do_sys_accept?
 
 I don't see those in the WCHAN in our "ps" logs, so that sounds like
 something to do in ddb. Are you asking that I do this on all the apache
 processes?
 
        bt /t 0t[pid]
 
 I managed to do this when there were stray ESTABLISHED connections, but
 I don't see any of what you asked for. I only see these:
 
   sleepq_block(...) at netbsd:sleepq_block+0xad
   sel_do_scan(...) at netbsd:sel_do_scan+0x4a8
   selcommon(...) at netbsd:selcommon+0x1f9
   sys___select50(...) at netbsd:sys___select50+0x77
   syscall(...) at netbsd:syscall+0xaa
 
   sleepq_block(...) at netbsd:sleepq_block+0xad
   cv_wait_sig(...) at netbsd:cv_wait_sig+0x103
   lf_advlock(...) at netbsd:lf_advlock+0x44a
   ufs_advlock(...) at netbsd:ufs_advlock+0x36
   VOP_ADVLOCK(...) at netbsd:VOP_ADVLOCK+0x41
   sys_flock(...) at netbsd:sys_flock+0xe2
   syscall(...) at netbsd:syscall+0xaa
 
   sleepq_block(...) at netbsd:sleepq_block+0xad
   cv_timedwait_sig(...) at netbsd:cv_timedwait_sig+0x101
   kevent1(...) at netbsd:kevent1+0x45a
   sys___kevent50(...) at netbsd:sys___kevent50+0x45
   syscall(...) at netbsd:syscall+0xaa
 
 This is the normal state of things. The apache processes are normally
 waiting in "lockf","select", or "kqueue".
 
 Interestingly, for the DEBUG/DIAGNOSTIC/LOCKDEBUG kernels, if I do that
 in ddb, and then do "cont", I get this panic (even without your patch):
 
   panic: SPL NOT LOWERED ON TRAP EXIT
 
   cpu0: Begin traceback...
   
panic(c0102c0f,ca010011,c0110031,1d6a0011,28f0011,c0b5fe08,0,ca015f38,c066d005,2b)
 at netbsd:panic+0x18
   alltraps(c01eb01d,c0431702,5,1,6,ca015f94,c039cd28,c0ad7668,c066d005,1) at 
netbsd:alltraps+0x17e
   
xencons_tty_input(c0ad7668,c066d005,1,6,ca015fa8,ca015f94,c01e5164,c054f800,c01eb01d,c043496a)
 at netbsd:xencons_tty_input+0xb8
   
xencons_handler(c0ad7668,2,c0adc128,ca015fe8,c013ec6b,c0adc128,ca174ca8,ca015fec,c016fa87,a)
 at netbsd:xencons_handler+0x78
   
intr_biglock_wrapper(c0adc128,ca174ca8,ca015fec,c016fa87,a,2,0,0,c0434968,40) 
at netbsd:intr_biglock_wrapper+0x1f
   evtchn_do_event(2,ca174ca8,ca174c54,0,0,0,0,0,0,0) at 
netbsd:evtchn_do_event+0x16b
   --- switch to interrupt stack ---
   
call_evtchn_do_event(ca174ca8,0,c0420011,ca170031,c0200011,11,c0b46d40,c04214c0,ca174d04,1)
 at netbsd:call_evtchn_do_event+0x1e
   
hypervisor_callback(c0b48d20,0,0,ca174da0,c0b48d20,c0b48d20,c01e0cf0,c0b48d20,0,c01000a1)
 at netbsd:hypervisor_callback+0x64
   idle_loop(c0b48d20,67c000,c057b200,0,c010006d,0,0,0,0,0) at 
netbsd:idle_loop+0x17c
   cpu0: End traceback...
 
 That doesn't prevent me from getting the trace on the processes, so it's
 not a big problem.
 
 Thanks.
 
 --
 - Brian

Prev by Date: Re: toolchain/48303 (Linux cross build fails on ppc)
Next by Date: Re: install/48303: Linux cross build fails on ppc
Previous by Thread: Re: kern/48098: panic: kernel diagnostic assertion
Next by Thread: xsrc/48347: Cannot input Japanese special keys of JP106 keyboard with NetBSD/evbarm(RPI) or some arch.
Indexes:

Home | Main Index | Thread Index | Old Index