tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: netbsd-5 NFS(?) lock up



On Sun, Mar 29, 2009 at 09:49:58PM +0200, Manuel Bouyer wrote:
> Hi,
> trying to upgrade a x86 NFS server from netbsd-3 to netbsd-5 has been
> a fiasco. The kernel looks up within seconds after going multiuser, even
> with SMP disabled in the BIOS (the kernel indeed sees only one CPU).
> LOCKDEBUG doesn't help, the kernel is just dead, all I can do is enter
> ddb on console.
> 
> Here's what I've been able to collect so far (this is with hyperthreading
> enabled in BIOS so kernel sees 2 CPUs). Hardware is a Intel X86 with 3Ghz
> Xeon CPU (one of the first EM64T xeons I think), 1G RAM. Disk drives are
> 2 wd(4) behind a piixide and 6 sd(4) behind two esiop(4), raid-1 raidframe on
> all disks. raid-1 parity reconstruct is running when the lockup occurs;
> and I suspect some NFS activity too (maybe several 100s of requests/s). 
> There is also samba running, but this one should be almost idle.

I setup a test box with a similar setup (hardware is not 100% identical
unfortunably); and got a LOCKDEBUG panic:
Mutex error: lockdebug_wantlock: locking against myself

lock address : 0x00000000ce88c028 type     :     sleep/adaptive
initialized  : 0x00000000c03a5052
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                 11
current cpu  :                  1 last held:                  1
current lwp  : 0x00000000ce8edcc0 last held: 0x00000000ce8edcc0
last locked  : 0x00000000c03aeff4 unlocked : 0x00000000c025da80
owner field  : 0x00000000ce8edcc0 wait/spin:                1/0

Turnstile chain at 0xc0704e60.
=> Turnstile at 0xce955788 (wrq=0xce955798, rdq=0xce9557a0).
=> 0 waiting readers:
=> 10 waiting writers: 0xce943d00 0xce943300 0xce8f2560 0xce8ed7c0 0xce465020 
0xce8f22e0 0xce8f2a60 0xce8eda40 0xce8f2ce0 0xce4652a0

panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c03f0e2c cs 8 eflags 246 cr2 cdee3000 ilevel 0
Stopped in pid 261.1 (nfsd) at  netbsd:breakpoint+0x4:  popl    %ebp
db{1}> tr
breakpoint(c0641842,ce918728,c2cec800,c035aaaf,0,1,0,0,ce918728,8) at 
netbsd:breakpoint+0x4
panic(c0641844,c063d60e,c051629b,c063d5dd,b4a8,18edcc0,0,d08fef18,0,ce88c028) 
at netbsd:panic+0x1b0
lockdebug_abort1(c063d5dd,1,0,0,cbf524d0,ce8ede78,0,6,d0821438,ce918b10) at 
netbsd:lockdebug_abort1+0xbb
mutex_vector_enter(ce88c028,11,ce918b6c,c025cc47,ce88c000,0,cbf66300,ce44692c,c3b92c00,ce918b58)
 at netbsd:mutex_vector_enter+0x464
genfs_renamelock_enter(ce88c000,0,cbf66300,ce44692c,c3b92c00,ce918b58,ce918b54,ce918b44,ce8edcc0,0)
 at netbsd:genfs_renamelock_enter+0x14
nfsrv_rename(d0c8ca20,ce44692c,ce8edcc0,ce918bd0,cd117b40,c0701d58,0,c2cec918,c0701d58,0)
 at netbsd:nfsrv_rename+0x4b7
nfssvc_nfsd(ce918c38,804a2e0,ce8edcc0,0,0,0,0,0,0,ffffffff) at 
netbsd:nfssvc_nfsd+0x3d6
sys_nfssvc(ce8edcc0,ce918d00,ce918d28,bfbff000,ce478684,ce478684,2,4,804a2e0,bfbfee94)
 at netbsd:sys_nfssvc+0x332
syscall(ce918d48,b3,ab,bfbf001f,bbbd001f,11,1,bfbfee94,0,bfbffff0) at 
netbsd:syscall+0xc8
db{1}> mach cpu 0
using CPU 0
db{1}> tr
__cpu_simple_lock(c2dee000,0,c01002a7,0,c01002a7,0,0,0,0,0) at netbsd:__cpu_simp
le_lock+0xd
db{1}> ps /l 
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
1480     1 3   1        84           cea4e000            raidctl nanoslp
443      1 3   1        84           cea03ac0            raidctl nanoslp
1443     1 3   1        84           cea4e280               tcsh pause
1760     1 3   1        84           ce9540a0                ksh pause
567      1 3   1        84           cea4ea00               tcsh pause
570      1 3   1        84           cea4ec80                top select
558      1 3   0        80           cea1d0e0               tcsh pause
556      1 3   1        84           cea1d360       screen-4.0.3 select
559      1 3   1        84           cea1d5e0       screen-4.0.3 pause
289      1 3   0        80           cea1d860               tcsh pause
446      1 3   1        84           cea1dae0               sshd select
465      1 3   0        80           cea1dd60               sshd netio
504      1 3   0        80           cea030c0              getty ttyraw
409      1 3   1        80           cea035c0              getty ttyraw
414      1 3   1        80           ce326860              getty ttyraw
509      1 3   1        84           ce326ae0              getty ttyraw
501      1 3   1        84           cea03340               cron nanoslp
502      1 3   1        84           ce954320              inetd kqueue
397      1 3   0        84           ce465a20                 sh wait
490      1 3   1        80           ce943080                 sh wait
358      1 3   1        84           cea03d40             smartd nanoslp
319      1 3   1        84           ce9545a0           sendmail pause
435      1 3   0        80           ce954aa0               sshd select
332      1 2   1   1000004           ce954820               ntpd
98       1 3   1        84           ce954d20          rpc.lockd select
285      1 3   1        84           ce4657a0          rpc.statd select
276      1 3   0         4           ce943300               nfsd tstile
270      1 3   0         4           ce943580               nfsd tstile
279      1 3   0         4           ce943800               nfsd tstile
282      1 2   1         4           ce943a80               nfsd
278      1 3   1         4           ce943d00               nfsd tstile
208      1 3   0         4           ce8f2060               nfsd tstile
271      1 3   1         4           ce8f22e0               nfsd tstile
280      1 3   1         4           ce8f2560               nfsd tstile
265      1 3   0         4           ce8f27e0               nfsd tstile
264      1 3   1         4           ce8f2a60               nfsd tstile
277      1 3   1         4           ce8f2ce0               nfsd tstile
266      1 3   0         4           ce8ed040               nfsd tstile
251      1 3   0         4           ce8ed2c0               nfsd tstile
275      1 3   0         4           ce8ed540               nfsd tstile
274      1 3   0         4           ce8ed7c0               nfsd tstile
259      1 3   0         4           ce8eda40               nfsd tstile
261  >   1 7   1         4           ce8edcc0               nfsd
263      1 3   1         4           ce465020               nfsd tstile
249      1 3   0         4           ce4652a0               nfsd tstile
260      1 3   0         4           ce413280               nfsd tstile
252      1 3   0        84           ce413000               nfsd select
237      1 3   1        84           ce465520             mountd select
203      1 3   1        84           ce413780            rpcbind select
159      1 3   0        84           ce413500            syslogd kqueue
134      1 3   0        84           ce3265e0           dhclient select
1        1 3   1        84           cbf76aa0               init wait
0       62 3   0       204           cea03840        raid_parity rfwcond
              58 3   1       204           ce465ca0            physiod physiod
           >  57 7   0       204           ce413a00            raidio3
              56 2   0       204           ce413c80              raid3
              55 3   1       204           ce3260e0            raidio2 raidiow
              54 3   1       204           ce326360              raid2 rfwcond
              53 3   0       204           ce326d60        vmem_rehash 
vmem_rehash
              52 3   0       204           ce3220c0           aiodoned aiodoned
              51 3   0     40204           ce322340            ioflush syncer
              50 3   1       204           ce3225c0           pgdaemon pgdaemon
              49 3   1       204           ce322840            raidio1 raidiow
              48 3   1       204           ce322ac0              raid1 rfwcond
              47 3   0       204           ce322d40            raidio0 raidiow
              46 3   0       204           cbf760a0              raid0 rfwcond
              45 3   0       204           cbf75300          cryptoret crypto_wa
it
              42 3   0       204           cbf75080               usb2 usbevt
              41 3   1       204           cbf75800               usb3 usbevt
              40 3   1       204           cbf75580               usb0 usbevt
              39 3   0       204           cbf76320         usbtask-dr usbtsk
              38 3   0       204           cbf76d20         usbtask-hc usbtsk
              37 3   1       204           cbf76820               usb1 usbevt
              36 3   0       204           cbf765a0              unpgc unpgc
              27 3   1       204           cbf75a80               iic0 iicintr
              26 3   0       204           cbf75d00            atabus3 atath
              25 3   1       204           cbf74060            atabus2 atath
              24 3   1       204           cbf742e0            atabus1 atath
              23 3   0       204           cbf74560            atabus0 atath
              22 3   0       204           cbf747e0           scsibus9 sccomp
              21 3   1       204           cbf74a60           scsibus8 sccomp
              20 3   0       204           cbf74ce0               pms0 pmsreset
              19 3   1       204           cbf72040               apm0 apmev
              18 3   1       204           cbf722c0            xcall/1 xcall
              17 1   1       204           cbf72540          softser/1
              16 1   1       204           cbf727c0          softclk/1
              15 1   1       204           cbf72a40          softbio/1
              14 1   1       204           cbf72cc0          softnet/1
              13 1   1       205           cbf6a020             idle/1
              12 3   0       204           cbf6a2a0             sysmon smtaskq
              11 3   0       204           cbf6a520           pmfevent pmfevent
              10 3   0       204           cbf6a7a0           nfssilly nfssilly
               9 3   1       204           cbf6aa20            cachegc cachegc
               8 3   1       204           cbf6aca0              vrele vrele
               7 3   0       204           cbf67000            xcall/0 xcall
               6 1   0       204           cbf67280          softser/0
               5 1   0       204           cbf67500          softclk/0
               4 1   0       204           cbf67780          softbio/0
               3 1   0       204           cbf67a00          softnet/0
               2 1   0       205           cbf67c80             idle/0
               1 3   0       204           c0699ee0            swapper schedule

db{1}> tr/a 0xce413a00
trace: pid 0 lid 57 at 0xce436d2c

The box is still in ddb; anything else I should check ?

-- 
Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index