tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: tstile syndrome



On Sat, Aug 29, 2009 at 09:55:58AM +0000, Andrew Doran wrote:
> On Thu, Aug 27, 2009 at 01:09:16PM +0200, Manuel Bouyer wrote:
> 
> > Hi,
> > here's what I found so far on a server that show the tstile hang,
> > with some ddb+gdb playing.
> > 
> > Most processes are waiting on a tunrstile (you did know that),
> > the one I started with had more than 4000 writers in the queue.
> > The threads did come here though a VOP_LOCK() (you did also know that).
> > This is a tunrstile for a rwlock, I found the owner of this rwlock.
> > This thread is also waiting on a turnstile, but a different one,
> > it also did come here though a VOP_LOCK. This is also a turnstile for a
> > rwlock, which also has a owner, which also has VOP_LOCK in his stack
> > trace and is waiting on a turnstile. It's also a rwlock (I checked the
> > l_syncobj) but l_wchan is bogus: ffff800079ac402f, this is not a
> > valid krwlock_t* (and examining memory at this address doesn't look like
> > a valid krwlock_t value, and 'show lock' doens't know about it either). 
> > 
> > any idea where to go from here ?
> 
> Are you running nullfs?  I suspect that with layerfs there are lock leaks
> and conditions where it continues to use another vnode's lock when that
> vnode has been freed or recycled and gained another identity.  Yet another
> reason take vnode locking to the dump, cue objection from Holland.

No nullfs. Just plain ffs+wapbl.

Below is what I could get on a test server. I could reproduce this using
2 rsync processes with
rsync -avH --delete --delete-excluded --delete-after --delay-updates --force 
--stats --partial
I ran both against rsync://ftp.fr.netbsd.org/ to which I have
direct connectivity, one rsync against
NetBSD/NetBSD-release-4-0/src or NetBSD/NetBSD-release-3-1/src
(switch between the two, so that it has something to update)
and one against
NetBSD/NetBSD-current/pkgsrc
It may require runinng rsync against an exising tree to update it,
instead of in a empty directory.

This box is still in ddb, tell me if there's something more I can get out
of it.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--
horn:/home/bouyer>ps axl -O laddr | grep tile
20331  884 ce925a80  145 15872 117  0  9060  6584 tstile  D+   ttyp3  1:27.46 r
20331 1379 ce8ff540  884 25361 117  0  9060  5756 tstile  D+   ttyp3  3:44.48 r
20331 1046 ce925080  608 18917 113  0 10084  7572 tstile  D+   ttyp4  5:03.82 r
20331  546 ce923a60  398     0 117  0  2144  1904 tstile  Ds+  ttyp5  0:00.09 -

ce925a80 (wchan_t) 0xdde954f0 rw_owner 0xce8ff54f
ce8ff540 (wchan_t) 0xd6f6c898 rw_owner 0xce925a8f
ce925080 (wchan_t) 0xce818028 mutex_owner
ce923a60 (wchan_t) 0xd6f6c898 rw_owner 0xce925a8f

db{0}> tr/a ce925a80
trace: pid 884 lid 1 at 0xce98182c
sleepq_block(0,0,c062653c,c06822dc,dde954f0,c06eb500,cbf48664,71,40,1000001) at 
netbsd:sleepq_block+0xeb
turnstile_block(0,1,dde954f0,c06822dc,cd57fa40,c06eb240,ce9818fc,c032603f,c06f376c,0)
 at netbsd:turnstile_block+0x261
rw_vector_enter(dde954f0,1,4,c02fc57e,dde95450,2,ce98196c,c0294751,dde954f0,2) 
at netbsd:rw_vector_enter+0x2a1
vlockmgr(dde954f0,2,0,dde95450,dde95450,d3097d58,ce98199c,c037f8f0,ce98198c,c06eb240)
 at netbsd:vlockmgr+0x126
ffs_lock(ce98198c,c06eb240,ce98199c,c0379c54,c2dd464c,ce925a80,c04f95c0,dde95450,2,2)
 at netbsd:ffs_lock+0x41
VOP_LOCK(dde95450,2,0,d3097d00,d3097d00,dde95450,ce9819ec,c0367305,dde95450,2) 
at netbsd:VOP_LOCK+0x60
vn_lock(dde95450,2,c031a446,0,8,c04f8fc0,cd57fa40,ce981c14,d6f6c7f8,0) at 
netbsd:vn_lock+0xd8
cache_lookup(d6f6c7f8,ce981c14,ce981c28,c02fc57e,0,0,ce981a2c,c04d2a16,cd0f50c0,c06eb8e0)
 at netbsd:cache_lookup+0x1f5
ufs_lookup(ce981ad4,0,ce981aec,c03706b5,c04f8e80,d6f6c7f8,ce981c14,ce981c28,cea23415,d6f6c7f8)
 at netbsd:ufs_lookup+0xcc
VOP_LOOKUP(d6f6c7f8,ce981c14,ce981c28,c037f8f0,ce981b1c,ce925a80,ce981b2c,c0379c54,20,0)
 at netbsd:VOP_LOOKUP+0x8c
lookup(ce981c00,20002,400,ce981c1c,1,d6ecaf20,1,c03702bd,ce981c1c,ce981b9f) at 
netbsd:lookup+0x20b
namei(ce981c00,ce981c70,ce981c0c,c02e63f0,cbf46600,ffffffff,0,80996e0,0,0) at 
netbsd:namei+0x145
do_sys_stat(80996e0,0,ce981c70,c02fcac2,c06eb442,c02eabd6,0,120d,81a4,12c74) at 
netbsd:do_sys_stat+0x37
sys___lstat30(ce925a80,ce981d00,ce981d28,ce981d40,c03b6de2,d939ac64,1,80996e0,bfbfc94c,bfbfc038)
 at netbsd:sys___lstat30+0x29
syscall(ce981d48,b3,ab,1f,1f,80996e0,bfbfc94c,bfbfc038,80996e0,0) at 
netbsd:syscall+0xc8
db{0}> tr/a ce8ff540
trace: pid 1379 lid 1 at 0xce96288c
sleepq_block(0,0,c062653c,c06822dc,d6f6c898,c06eb3a0,cbf48698,71,40,1000001) at 
netbsd:sleepq_block+0xeb
turnstile_block(0,1,d6f6c898,c06822dc,cbf46600,2,0,0,0,0) at 
netbsd:turnstile_block+0x261
rw_vector_enter(d6f6c898,1,0,6,d6f6c7f8,2,ce9629cc,c0294751,d6f6c898,2) at 
netbsd:rw_vector_enter+0x2a1
vlockmgr(d6f6c898,2,4f6b,d6f6c7f8,d6f6c7f8,ce962c48,ce9629fc,c037f8f0,ce9629ec,6)
 at netbsd:vlockmgr+0x126
ffs_lock(ce9629ec,6,ce9629fc,c0379c54,c2dd464c,dde95450,c04f95c0,d6f6c7f8,2,20002)
 at netbsd:ffs_lock+0x41
VOP_LOCK(d6f6c7f8,2,c04f9600,d6ebe680,ce962bbc,dde95450,ce962b0c,c02a42cb,d6f6c7f8,20002)
 at netbsd:VOP_LOCK+0x60
vn_lock(d6f6c7f8,20002,cbf46600,8000,101c898,1ff,1ff,0,200,0) at 
netbsd:vn_lock+0xd8
wapbl_ufs_rename(ce962bbc,0,0,0,ce962c48,ce8ff540,ce962b4c,c03783b8,ceafe8a0,2) 
at netbsd:wapbl_ufs_rename+0x61b
ufs_rename(ce962bbc,0,ce962bdc,c037f886,ce962bcc,ce962c90,c04f9380,d6f6c7f8,d6ebe680,ce962c90)
 at netbsd:ufs_rename+0x30
VOP_RENAME(d6f6c7f8,d6ebe680,ce962c90,dde95450,0,ce962c48,cbf44e80,ce818000,0,1)
 at netbsd:VOP_RENAME+0x7c
do_sys_rename(bfbfca4c,80996e0,0,0,0,c067ff54,ce962d3c,c03cd4f8,ce8ff540,ce962d00)
 at netbsd:do_sys_rename+0x3c5
sys_rename(ce8ff540,ce962d00,ce962d28,ce962d40,c03b6de2,d939ad24,1,bfbfca4c,80996e0,ce8ff540)
 at netbsd:sys_rename+0x26
syscall(ce962d48,b3,ab,bfbf001f,bbbd001f,bfbfca4c,3,bfbfbdc8,80996e0,bfbfca4c) 
at netbsd:syscall+0xc8
db{0}> whatis 0xdde954f0
0xdde954f0 is 0xdde95000+1264 in POOL 'kvakernel' (allocated)
0xdde954f0 is 0xdde95450+160 in POOL 'vnodepl' (allocated)
0xdde954f0 is 0xddda0000+1004784 from VMMAP 0xc06d8bc0
db{0}> whatis 0xd6f6c898
0xd6f6c898 is 0xd6f6c000+2200 in POOL 'kvakernel' (allocated)
0xd6f6c898 is 0xd6f6c7f8+160 in POOL 'vnodepl' (allocated)
0xd6f6c898 is 0xd6940000+6473880 from VMMAP 0xc06d8bc0
db{0}> show lock 0xdde954f0
lock address : 0x00000000dde954f0 type     :     sleep/adaptive
initialized  : 0x00000000c036f560
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  0 last held:                  0
current lwp  : 0x00000000cbf47c80 last held: 0x00000000ce8ff540
last locked  : 0x00000000c036d237 unlocked : 0x00000000c036d2d8
owner/count  : 0x00000000ce8ff540 flags    : 0x000000000000000f

Turnstile chain at 0xc06eb500.
=> Turnstile at 0xcbf4864c (wrq=0xcbf4865c, rdq=0xcbf48664).
=> 0 waiting readers:
=> 1 waiting writers: 0xce925a80
db{0}> show lock 0xd6f6c898
lock address : 0x00000000d6f6c898 type     :     sleep/adaptive
initialized  : 0x00000000c036f560
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  2
current cpu  :                  0 last held:                  1
current lwp  : 0x00000000cbf47c80 last held: 0x00000000ce925a80
last locked  : 0x00000000c036d237 unlocked : 0x00000000c036d2d8
owner/count  : 0x00000000ce925a80 flags    : 0x000000000000000f

Turnstile chain at 0xc06eb3a0.
=> Turnstile at 0xcbf48680 (wrq=0xcbf48690, rdq=0xcbf48698).
=> 0 waiting readers:
=> 2 waiting writers: 0xce923a60 0xce8ff540


Home | Main Index | Thread Index | Old Index