Subject: Re: port-sparc64/29473: nfs + bus_dmamap_load_mbuf often results in a hang
To: Andrey Petrov <petrov@netbsd.org>
From: john heasley <heas@shrubbery.net>
List: netbsd-bugs
Date: 03/03/2005 18:37:53
Wed, Feb 23, 2005 at 12:24:03PM -0800, Andrey Petrov:
> On Wed, Feb 23, 2005 at 11:56:30AM -0800, john heasley wrote:
> > oops, in such a rush today.  I meant to mention that I'll try poking it
> > some more next week.  if you can think of anything sepecific that I should
> > collect, let me know.
> > 
> 
> At the moment I can think of '%tl-c .' for trap level and corresponding
> kernel symbols for valid trap addresses (%TPC).

Hi Andrey.  collected a some more info last night, but I do not see what
is going wrong..

first, the server does not respond to ping and getty does not respond on
the console.  It does appear as if, from _load_mbuf, it enters pmap_extract,
though it does not show-up in the trace.

lom>break
kdb breakpoint at 11b1548
Stopped in pid 406.1 (nfsd) at  netbsd:cpu_Debugger+0x4:        nop
db> bt   
intr_list_handler(3f07cc0, 6, e0017ed0, 38, 11a5a94, 0) at netbsd:intr_list_handler+0x10 
sparc_interrupt(7, e80e000, e827128, 0, 0, e827450) at netbsd:sparc_interrupt+0x1d4
_bus_dmamap_load_mbuf(3f15c00, 4449000, 3ee7d80, 401, ffffffffffffffef, e8275f0) at netbsd:_bus_dmamap_load_mbuf+0xa4
gem_start(4444060, 16fc, 16f8, 3e, e8275f0, 44446d0) at netbsd:gem_start+0x84
ether_output(0, 3ee7c80, 3ed2488, 800, 3ed8518, 40) at netbsd:ether_output+0x358
ip_output(3ee7b80, 4444060, 3ed2480, 3ed2488, 0, 3ee7d60) at netbsd:ip_output+0x5c8
udp_output(3ed2480, 3ed2420, c6, 10, 6, 3a) at netbsd:udp_output+0x254
udp_usrreq(3ed0d80, 9, 3ed8b10, 3ee3e60, 0, dddd860) at netbsd:udp_usrreq+0x1f0
sosend(0, 0, 0, 3ed8b10, 0, 0) at netbsd:sosend+0x3c4
nfs_send(3ed0d80, 3ee3e60, 3ed8b10, 0, dddd860, 6000) at netbsd:nfs_send+0x9c
nfssvc_nfsd(0, dddd860, ddd5700, e827bd0, 2, 183b5f0) at netbsd:nfssvc_nfsd+0x64c
sys_nfssvc(0, e827dd0, e827dc0, 0, e827dd0, 0) at netbsd:sys_nfssvc+0x310
syscall(e827ed0, 9b, 405369e0, e827dd0, 405369e0, 405369e4) at netbsd:syscall+0xd4
?(4, 202d78, 18, ffffffffffffcc50, 0, 0) at 0x1008cb8

db> show reg
tstate      0x1d000606
pc          0x11b154c   cpu_Debugger+0x4
npc         0x11b1550   cpu_Debugger+0x8
ipl         0xc
y           0
g0          0
g1          0x180b800   db_examine_format+0x10
g2          0x1
g3          0x181c90c   cn_magic
g4          0xf9
g5          0xf9
g6          0
g7          0x1067e8
o0          0x1
o1          0x1820c00   timeout_wheel+0x3cc8
o2          0xa
o3          0x1d
o4          0x51bf1
o5          0xc6
o6          0xe0017471
o7          0x106db14   comintr+0x688

db> mach tf
Trapframe 0x1848d00:    tstate: 1d000606        pc: 11b154c     npc: 11b1550
y: 0    pil: 12 oldpil: 12      fault: 0        tt: 101 Globals:
00000000044a8b00 000000000180b800 0000000000000001 000000000181c90c 
00000000000000f9 00000000000000f9 0000000000000000 00000000001067e8
outs:
0000000000000001 0000000001820c00 000000000000000a 000000000000001d
0000000000051bf1 00000000000000c6 00000000e0017471 000000000106db14
db> mach stack
Window 0 frame64 0xe0017c70 locals, ins:
4440004 7fe 1819400 1812000 4441000 1819650 e0018000 e0018000
42bd600 194e47a346f 194e4c67940 8000000000000000 194e4c67940 40 e0017561=sp 11a5aa4=pc:netbsd:intr_list_handler+0x10
Window 1 frame64 0xe0017d60 locals, ins:
0 10194 0 0 e827a08 3ed0d80 1 180f400
3f07cc0 6 e0017ed0 38 11a5a94 0 e0017621=sp 1008fbc=pc:netbsd:sparc_interrupt+0x1d4
Window 2 frame64 0xe0017e20 locals, ins:
4482000603 11a6ed4 3f07d80 1000 1805408 0 c ffffffffffffffff
7 e80e000 e827128 0 0 e827450 e826871=sp 11a6f58=pc:netbsd:_bus_dmamap_load_mbuf+0xa4

ok %tl-c .  
1
ok .trap-registers
%TL:1 %TT:17f %TPC:f0056e14 %TnPC:f0056e18
%TSTATE:881d000405  %CWP:5
   %PSTATE:4 AG:0 IE:0 PRIV:1 AM:0 PEF:0 RED:0 MM:0 TLE:0 CLE:0 MG:0 IG:0
   %ASI:1d  %CCR:88  XCC:Nzvc   ICC:Nzvc

%TL:2 %TT:98 %TPC:1008ec8 %TnPC:1008ecc
%TSTATE:82000404  %CWP:4
   %PSTATE:4 AG:0 IE:0 PRIV:1 AM:0 PEF:0 RED:0 MM:0 TLE:0 CLE:0 MG:0 IG:0
   %ASI:82  %CCR:0  XCC:nzvc   ICC:nzvc

%TL:3 %TT:68 %TPC:1005804 %TnPC:1005808
%TSTATE:11001507  %CWP:7
   %PSTATE:15 AG:1 IE:0 PRIV:1 AM:0 PEF:1 RED:0 MM:0 TLE:0 CLE:0 MG:0 IG:0
   %ASI:11  %CCR:0  XCC:nzvc   ICC:nzvc

The client gets to the point where it tries to mount it's / and that is
when the server hangs.  once the server reboots and the client finally
retries the mount, the server hangs again.  The trace is always the same.

I added a little intrumentation to _load_mbuf to see if it was actually
entering pmap_extract.  I think it is:

vaddr = 0x3ee7352 0xe809128
pmap_extract: va=0x3ee7352 segs[0]=4000 segs[0][7]=7fea2000 segs[0][7][883]=800000007eb8f636 pseg_get: 7eb8e000
vaddr = 0xdb5a000 0xe809128
pmap_extract: va=0xdb5a000 segs[0]=4000 segs[0][27]=7ed96000 segs[0][27][429]=800000007eb17636 pseg_get: 7eb16000

vaddr = 0xe7ce000 0xe809128
pmap_extract: va=0xe7ce000 segs[0]=4000 segs[0][28]=7e69c000 segs[0][28][999]=800000007d0c9234 pseg_get: 7d0c8000

I also added a few rudimentary checks from the x86 _load_mbuf:

/* #ifdef DIAGNOSTIC*/
#if 1
        if ((m->m_flags & M_PKTHDR) == 0)
                panic("_bus_dmamap_load_mbuf: no packet header");
#endif

        if (m->m_pkthdr.len > map->_dm_size)
                return (EINVAL);

and added BUS_DMA_ALLOCNOW to the bus_dmamap_create.

any ideas?