Subject: Re: port-sparc64/29473: nfs + bus_dmamap_load_mbuf often results in a hang
To: None <port-sparc64-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: john heasley <heas@shrubbery.net>
List: netbsd-bugs
Date: 03/03/2005 18:38:01
The following reply was made to PR port-sparc64/29473; it has been noted by GNATS.

From: john heasley <heas@shrubbery.net>
To: Andrey Petrov <petrov@netbsd.org>
Cc: john heasley <heas@shrubbery.net>, gnats-bugs@netbsd.org,
	port-sparc64-maintainer@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: port-sparc64/29473: nfs + bus_dmamap_load_mbuf often results in a hang
Date: Thu, 3 Mar 2005 18:37:53 +0000

 Wed, Feb 23, 2005 at 12:24:03PM -0800, Andrey Petrov:
 > On Wed, Feb 23, 2005 at 11:56:30AM -0800, john heasley wrote:
 > > oops, in such a rush today.  I meant to mention that I'll try poking it
 > > some more next week.  if you can think of anything sepecific that I should
 > > collect, let me know.
 > > 
 > 
 > At the moment I can think of '%tl-c .' for trap level and corresponding
 > kernel symbols for valid trap addresses (%TPC).
 
 Hi Andrey.  collected a some more info last night, but I do not see what
 is going wrong..
 
 first, the server does not respond to ping and getty does not respond on
 the console.  It does appear as if, from _load_mbuf, it enters pmap_extract,
 though it does not show-up in the trace.
 
 lom>break
 kdb breakpoint at 11b1548
 Stopped in pid 406.1 (nfsd) at  netbsd:cpu_Debugger+0x4:        nop
 db> bt   
 intr_list_handler(3f07cc0, 6, e0017ed0, 38, 11a5a94, 0) at netbsd:intr_list_handler+0x10 
 sparc_interrupt(7, e80e000, e827128, 0, 0, e827450) at netbsd:sparc_interrupt+0x1d4
 _bus_dmamap_load_mbuf(3f15c00, 4449000, 3ee7d80, 401, ffffffffffffffef, e8275f0) at netbsd:_bus_dmamap_load_mbuf+0xa4
 gem_start(4444060, 16fc, 16f8, 3e, e8275f0, 44446d0) at netbsd:gem_start+0x84
 ether_output(0, 3ee7c80, 3ed2488, 800, 3ed8518, 40) at netbsd:ether_output+0x358
 ip_output(3ee7b80, 4444060, 3ed2480, 3ed2488, 0, 3ee7d60) at netbsd:ip_output+0x5c8
 udp_output(3ed2480, 3ed2420, c6, 10, 6, 3a) at netbsd:udp_output+0x254
 udp_usrreq(3ed0d80, 9, 3ed8b10, 3ee3e60, 0, dddd860) at netbsd:udp_usrreq+0x1f0
 sosend(0, 0, 0, 3ed8b10, 0, 0) at netbsd:sosend+0x3c4
 nfs_send(3ed0d80, 3ee3e60, 3ed8b10, 0, dddd860, 6000) at netbsd:nfs_send+0x9c
 nfssvc_nfsd(0, dddd860, ddd5700, e827bd0, 2, 183b5f0) at netbsd:nfssvc_nfsd+0x64c
 sys_nfssvc(0, e827dd0, e827dc0, 0, e827dd0, 0) at netbsd:sys_nfssvc+0x310
 syscall(e827ed0, 9b, 405369e0, e827dd0, 405369e0, 405369e4) at netbsd:syscall+0xd4
 ?(4, 202d78, 18, ffffffffffffcc50, 0, 0) at 0x1008cb8
 
 db> show reg
 tstate      0x1d000606
 pc          0x11b154c   cpu_Debugger+0x4
 npc         0x11b1550   cpu_Debugger+0x8
 ipl         0xc
 y           0
 g0          0
 g1          0x180b800   db_examine_format+0x10
 g2          0x1
 g3          0x181c90c   cn_magic
 g4          0xf9
 g5          0xf9
 g6          0
 g7          0x1067e8
 o0          0x1
 o1          0x1820c00   timeout_wheel+0x3cc8
 o2          0xa
 o3          0x1d
 o4          0x51bf1
 o5          0xc6
 o6          0xe0017471
 o7          0x106db14   comintr+0x688
 
 db> mach tf
 Trapframe 0x1848d00:    tstate: 1d000606        pc: 11b154c     npc: 11b1550
 y: 0    pil: 12 oldpil: 12      fault: 0        tt: 101 Globals:
 00000000044a8b00 000000000180b800 0000000000000001 000000000181c90c 
 00000000000000f9 00000000000000f9 0000000000000000 00000000001067e8
 outs:
 0000000000000001 0000000001820c00 000000000000000a 000000000000001d
 0000000000051bf1 00000000000000c6 00000000e0017471 000000000106db14
 db> mach stack
 Window 0 frame64 0xe0017c70 locals, ins:
 4440004 7fe 1819400 1812000 4441000 1819650 e0018000 e0018000
 42bd600 194e47a346f 194e4c67940 8000000000000000 194e4c67940 40 e0017561=sp 11a5aa4=pc:netbsd:intr_list_handler+0x10
 Window 1 frame64 0xe0017d60 locals, ins:
 0 10194 0 0 e827a08 3ed0d80 1 180f400
 3f07cc0 6 e0017ed0 38 11a5a94 0 e0017621=sp 1008fbc=pc:netbsd:sparc_interrupt+0x1d4
 Window 2 frame64 0xe0017e20 locals, ins:
 4482000603 11a6ed4 3f07d80 1000 1805408 0 c ffffffffffffffff
 7 e80e000 e827128 0 0 e827450 e826871=sp 11a6f58=pc:netbsd:_bus_dmamap_load_mbuf+0xa4
 
 ok %tl-c .  
 1
 ok .trap-registers
 %TL:1 %TT:17f %TPC:f0056e14 %TnPC:f0056e18
 %TSTATE:881d000405  %CWP:5
    %PSTATE:4 AG:0 IE:0 PRIV:1 AM:0 PEF:0 RED:0 MM:0 TLE:0 CLE:0 MG:0 IG:0
    %ASI:1d  %CCR:88  XCC:Nzvc   ICC:Nzvc
 
 %TL:2 %TT:98 %TPC:1008ec8 %TnPC:1008ecc
 %TSTATE:82000404  %CWP:4
    %PSTATE:4 AG:0 IE:0 PRIV:1 AM:0 PEF:0 RED:0 MM:0 TLE:0 CLE:0 MG:0 IG:0
    %ASI:82  %CCR:0  XCC:nzvc   ICC:nzvc
 
 %TL:3 %TT:68 %TPC:1005804 %TnPC:1005808
 %TSTATE:11001507  %CWP:7
    %PSTATE:15 AG:1 IE:0 PRIV:1 AM:0 PEF:1 RED:0 MM:0 TLE:0 CLE:0 MG:0 IG:0
    %ASI:11  %CCR:0  XCC:nzvc   ICC:nzvc
 
 The client gets to the point where it tries to mount it's / and that is
 when the server hangs.  once the server reboots and the client finally
 retries the mount, the server hangs again.  The trace is always the same.
 
 I added a little intrumentation to _load_mbuf to see if it was actually
 entering pmap_extract.  I think it is:
 
 vaddr = 0x3ee7352 0xe809128
 pmap_extract: va=0x3ee7352 segs[0]=4000 segs[0][7]=7fea2000 segs[0][7][883]=800000007eb8f636 pseg_get: 7eb8e000
 vaddr = 0xdb5a000 0xe809128
 pmap_extract: va=0xdb5a000 segs[0]=4000 segs[0][27]=7ed96000 segs[0][27][429]=800000007eb17636 pseg_get: 7eb16000
 
 vaddr = 0xe7ce000 0xe809128
 pmap_extract: va=0xe7ce000 segs[0]=4000 segs[0][28]=7e69c000 segs[0][28][999]=800000007d0c9234 pseg_get: 7d0c8000
 
 I also added a few rudimentary checks from the x86 _load_mbuf:
 
 /* #ifdef DIAGNOSTIC*/
 #if 1
         if ((m->m_flags & M_PKTHDR) == 0)
                 panic("_bus_dmamap_load_mbuf: no packet header");
 #endif
 
         if (m->m_pkthdr.len > map->_dm_size)
                 return (EINVAL);
 
 and added BUS_DMA_ALLOCNOW to the bus_dmamap_create.
 
 any ideas?