Subject: Re: panics and lockups in -current
To: Paul Dokas <dokas@cs.umn.edu>
From: Patrick Welche <prlw1@newn.cam.ac.uk>
List: current-users
Date: 02/24/2005 12:10:17
On Fri, Feb 18, 2005 at 03:31:37PM -0600, Paul Dokas wrote:
...
> I declared victory too soon. The machine locked up today with the following
> on the screen:
>
> ex0: too many segments, retrying
> ex0: uplistptr was 0
>
> dropping into DDB, the stack trace was very similar to what's shown above:
>
> Stopped in pid 10.1 (pagedaemon) at netbsd:cpu_Debugger+0x4
> db> bt
> .
> .
> .
> Xspllower(7,c0e09700,ffffffff,282,70000) at netbsd:Xspllower+0xe
> m_freem(c0e09700,c0ef3800,cdb4048c,cd64e854,ccfdd854) at netbsd:m_freem+0x99
> ex_intr(c0ef0000,0,10,6e860030,30010) at netbsd:ex_intr+0x16a
> Xintr_legacy10() at netbsd:Xintr_legacy10+0xad
> --- interrupt ---
> lockmgr(c04dbe20,ce7fc000,ce800000,202,4215689c) at netbsd:lockmgr
> uvm_swapout(cddbb088,0,ccfd4f3c,c02bc8c3,97) at netbsd:uvm_swapout+0x8d
> umn_swapout_threads(0,0,c0437134,54,8e10) at netbsd:uvm_swapout_threads+0xbb
> uvmd_scan(0,0,55566d7e,8c14,0) at netbsd:uvmd_scan+0x1c9
> uvm_pageout(cc66b4a4,56c000,574000,0,c0100321) at netbsd:uvm_pageout+0xdb
>
> Looking at the kernel in gdb, it appears that the call to Xspllower is in
> the MBUFLOCK macro within the MFREE macro at uipc_mbuf.c line 454. The
> actual call that seems to be deadlocked is the "splx(ms)" at sys/mbuf.h
> line 334.
I posted something just like this:
http://mail-index.netbsd.org/current-users/2005/02/15/0011.html
and today I found the same box with a slightly different backtrace:
ex0: too many segments, retrying
ex0: uplistptr was 0
ex1: uplistptr was 0
kernel: page fault trap, code=0
Stopped at netbsd:arpintr+0xc4: movzbl 0x34(%eax),%eax
db> bt
arpintr(c9e80010,30,f0010,c0480010,c054f000) at netbsd:arpintr+0xc4
DDB lost frame for netbsd:Xsoftnet+0x34, trying 0xc0552df0
Xsoftnet() at netbsd:Xsoftnet+0x34
--- interrupt --
0x246:
db>
So, apparently the same outcome, but with 3 different traces.. I am hopeful
that this may be the fix:
----------------------------
revision 1.38
date: 2005/02/24 08:04:02; author: martin; state: Exp; lines: +3 -3
Fix the size of psc_regs (0x3c >> 2 is the biggest index used to access
it now, so pick 0x40 >> 2). Fixes "Bug 6", reported by Ted Unangst on
tech-kern.
----------------------------
and hopefully the menace behind
http://mail-index.netbsd.org/netbsd-help/2005/02/01/0009.html
http://mail-index.netbsd.org/current-users/2004/10/13/0010.html
too.. (ISTR someone having problems with an ex0 plugged into a 10MB switch,
but didn't find the reference..)
Now to try to crash the box, add the above patch, and retry for a crash..
In passing: very few of us have the problem, but it is very noticable
to those who do, so it isn't a general problem with ex0... Both
boxes which crash for me are gateways, and both are running ipf - is yours?
Cheers,
Patrick