Subject: panics and lockups in -current
To: None <current-users@netbsd.org>
From: Paul Dokas <dokas@cs.umn.edu>
List: current-users
Date: 02/18/2005 15:31:37
On Thu, 17 Feb 2005 22:34:26 -0600 Paul Dokas <dokas@cs.umn.edu> wrote:
>   Xspllower(7,c0fdff00,ffffffff,286,c0dea000) at netbsd:Xspllower+0xe
>   m_freem(c0d3f500,0,52,c2507634,c0d3f500) at netbsd:m_freem_0x99
>   fxp_start(c0dea044,c047aa9c,c0dea044,2,ca517024) at netbsd:fxp_start_0x2c4
>   ether_output(c0dea044,c2506000,c0fe1d98,c0fc5df0,c2586000) at netbsd:ether_output+0x2dc
>   ip_output(c2508000,0,c03fa1f4,1,8) at netbsd:ip_output_0x621
>   ip_forward(c2506000,0,c0f7a000,1,0) at netbsd:ip_forward+0x16a
>   ip_input(c2506000,0,0,246,0) at netbsd:ip_input+0x27b
>   ipintr(928a0010,50030,cdba0010,c0470010,c0477000) at netbsd:ipintr+0x76
>   DDB lost frame for netbsd:Xsoftnet+0x41, trying 0xc047ae80
>   Xsoftnet() at netbsd:Xsoftnet+0x41
> 
> 
> The final nail in this for me is that I swapped out the Intel NICs for a 3COM:
> 
>   ex0 at pci2 dev 8 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x78)
>   ex0: interrupting at irq 10
>   ex0: MAC address 00:04:75:c7:b4:b7
>   exphy0 at ex0 phy 24: 3Com internal media interface
>   exphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> 
> and haven't had any problems since.


Following up on my own followup.


I declared victory too soon.  The machine locked up today with the following
on the screen:

  ex0: too many segments, retrying
  ex0: uplistptr was 0

dropping into DDB, the stack trace was very similar to what's shown above:

  Stopped in pid 10.1 (pagedaemon) at  netbsd:cpu_Debugger+0x4
  db> bt
  .
  .
  .
  Xspllower(7,c0e09700,ffffffff,282,70000) at netbsd:Xspllower+0xe
  m_freem(c0e09700,c0ef3800,cdb4048c,cd64e854,ccfdd854) at netbsd:m_freem+0x99
  ex_intr(c0ef0000,0,10,6e860030,30010) at netbsd:ex_intr+0x16a
  Xintr_legacy10() at netbsd:Xintr_legacy10+0xad
  --- interrupt ---
  lockmgr(c04dbe20,ce7fc000,ce800000,202,4215689c) at netbsd:lockmgr
  uvm_swapout(cddbb088,0,ccfd4f3c,c02bc8c3,97) at netbsd:uvm_swapout+0x8d
  umn_swapout_threads(0,0,c0437134,54,8e10) at netbsd:uvm_swapout_threads+0xbb
  uvmd_scan(0,0,55566d7e,8c14,0) at netbsd:uvmd_scan+0x1c9
  uvm_pageout(cc66b4a4,56c000,574000,0,c0100321) at netbsd:uvm_pageout+0xdb

Looking at the kernel in gdb, it appears that the call to Xspllower is in
the MBUFLOCK macro within the MFREE macro at uipc_mbuf.c line 454.  The
actual call that seems to be deadlocked is the "splx(ms)" at sys/mbuf.h
line 334.

Please, can _anyone_ shed some light on how this can be fixed?  I've now
got two machines running -current that are locking up or panicing at least
once a day each.  And, I'm more than willing to provide all of the debugging
information needed to solve this.  Unfortunately, I will be out of town and
away from email for the next 5 days, so debugging will have to resume then.

Paul
-- 
Paul Dokas                                            dokas@cs.umn.edu
======================================================================
Don Juan Matus:  "an enigma wrapped in mystery wrapped in a tortilla."