NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/48098: panic: kernel diagnostic assertion
The following reply was made to PR kern/48098; it has been noted by GNATS.
From: Brian Marcotte <marcotte%panix.com@localhost>
To: "S.P.Zeidler" <spz%NetBSD.org@localhost>
Cc: gnats-bugs%gnats.NetBSD.org@localhost, Brian Marcotte
<marcotte%panix.com@localhost>
Subject: Re: kern/48098: panic: kernel diagnostic assertion
Date: Mon, 28 Oct 2013 08:20:30 -0400
> could you please test if the attached patch (kudos
> mlelstv%NetBSD.org@localhost)
> helps with the tcpdrop issue? it's only lightly tested so maybe not
> on your most precious server. :)
That patch seems to make things worse. With it, I get this panic using
tcpdrop on connections in FIN_WAIT_1 (possibly others):
Mutex error: mutex_vector_enter: locking against myself
lock address : 0x00000000c104ff40
current cpu : 0
current lwp : 0x00000000c24fa540
owner field : 0x00000000c24fa540 wait/spin: 0/0
panic: lock error
cpu0: Begin traceback...
panic(c0408709,c03f8468,c03d8841,c03f8437,c104ff40,0,c24fa540,c0428ae0,c0428ae0,d8651a9c)
at netbsd:panic+0x18
lockdebug_abort(c104ff40,c0428ae0,c03d8841,c03f8437,c104ff40,c24fa540,d8651acc,c01eb13b,c208fa00,ffffffff)
at netbsd:lockdebug_abort+0x2f
mutex_abort(c208fa00,ffffffff,c6f53133,c257921c,c24d3744,bb018588,c12feab0,c104ff40,c24f5174,0)
at netbsd:mutex_abort+0x32
mutex_vector_enter(c104ff40,3331f5c6,6fcb,ed0854a6,bb01,0,0,d8651b2c,c24d3744,2)
at netbsd:mutex_vector_enter+0x33b
sysctl_net_inet_tcp_ident(d8651c94,0,0,d8651cb4,bf7fec70,100,d8651c84,c24fa540,c106b9c0,4)
at netbsd:sysctl_net_inet_tcp_ident+0x383
sysctl_dispatch(d8651c84,4,0,d8651cb4,bf7fec70,100,d8651c84,c24fa540,c106b9c0,d8651cb4)
at netbsd:sysctl_dispatch+0xc7
sys___sysctl(c24fa540,d8651d00,d8651d28,0,c132bbd0,ca,0,bb78b000,d8651d00,c256ba58)
at netbsd:sys___sysctl+0xea
syscall(d8651d48,b3,ab,bf7f001f,806001f,4,0,bf7fe3fc,bb7c65bc,0) at
netbsd:syscall+0xaa
cpu0: End traceback...
> For the reason there are stuck connections in the first place:
> when you have some, could you please check if an apache thread is
> stuck either in soaccept or kauth_cred_dup, from do_sys_accept?
I don't see those in the WCHAN in our "ps" logs, so that sounds like
something to do in ddb. Are you asking that I do this on all the apache
processes?
bt /t 0t[pid]
I managed to do this when there were stray ESTABLISHED connections, but
I don't see any of what you asked for. I only see these:
sleepq_block(...) at netbsd:sleepq_block+0xad
sel_do_scan(...) at netbsd:sel_do_scan+0x4a8
selcommon(...) at netbsd:selcommon+0x1f9
sys___select50(...) at netbsd:sys___select50+0x77
syscall(...) at netbsd:syscall+0xaa
sleepq_block(...) at netbsd:sleepq_block+0xad
cv_wait_sig(...) at netbsd:cv_wait_sig+0x103
lf_advlock(...) at netbsd:lf_advlock+0x44a
ufs_advlock(...) at netbsd:ufs_advlock+0x36
VOP_ADVLOCK(...) at netbsd:VOP_ADVLOCK+0x41
sys_flock(...) at netbsd:sys_flock+0xe2
syscall(...) at netbsd:syscall+0xaa
sleepq_block(...) at netbsd:sleepq_block+0xad
cv_timedwait_sig(...) at netbsd:cv_timedwait_sig+0x101
kevent1(...) at netbsd:kevent1+0x45a
sys___kevent50(...) at netbsd:sys___kevent50+0x45
syscall(...) at netbsd:syscall+0xaa
This is the normal state of things. The apache processes are normally
waiting in "lockf","select", or "kqueue".
Interestingly, for the DEBUG/DIAGNOSTIC/LOCKDEBUG kernels, if I do that
in ddb, and then do "cont", I get this panic (even without your patch):
panic: SPL NOT LOWERED ON TRAP EXIT
cpu0: Begin traceback...
panic(c0102c0f,ca010011,c0110031,1d6a0011,28f0011,c0b5fe08,0,ca015f38,c066d005,2b)
at netbsd:panic+0x18
alltraps(c01eb01d,c0431702,5,1,6,ca015f94,c039cd28,c0ad7668,c066d005,1) at
netbsd:alltraps+0x17e
xencons_tty_input(c0ad7668,c066d005,1,6,ca015fa8,ca015f94,c01e5164,c054f800,c01eb01d,c043496a)
at netbsd:xencons_tty_input+0xb8
xencons_handler(c0ad7668,2,c0adc128,ca015fe8,c013ec6b,c0adc128,ca174ca8,ca015fec,c016fa87,a)
at netbsd:xencons_handler+0x78
intr_biglock_wrapper(c0adc128,ca174ca8,ca015fec,c016fa87,a,2,0,0,c0434968,40)
at netbsd:intr_biglock_wrapper+0x1f
evtchn_do_event(2,ca174ca8,ca174c54,0,0,0,0,0,0,0) at
netbsd:evtchn_do_event+0x16b
--- switch to interrupt stack ---
call_evtchn_do_event(ca174ca8,0,c0420011,ca170031,c0200011,11,c0b46d40,c04214c0,ca174d04,1)
at netbsd:call_evtchn_do_event+0x1e
hypervisor_callback(c0b48d20,0,0,ca174da0,c0b48d20,c0b48d20,c01e0cf0,c0b48d20,0,c01000a1)
at netbsd:hypervisor_callback+0x64
idle_loop(c0b48d20,67c000,c057b200,0,c010006d,0,0,0,0,0) at
netbsd:idle_loop+0x17c
cpu0: End traceback...
That doesn't prevent me from getting the trace on the processes, so it's
not a big problem.
Thanks.
--
- Brian
Home |
Main Index |
Thread Index |
Old Index