tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: tstile syndrome

On Sat, Aug 29, 2009 at 09:55:58AM +0000, Andrew Doran wrote:
> On Thu, Aug 27, 2009 at 01:09:16PM +0200, Manuel Bouyer wrote:
> > Hi,
> > here's what I found so far on a server that show the tstile hang,
> > with some ddb+gdb playing.
> > 
> > Most processes are waiting on a tunrstile (you did know that),
> > the one I started with had more than 4000 writers in the queue.
> > The threads did come here though a VOP_LOCK() (you did also know that).
> > This is a tunrstile for a rwlock, I found the owner of this rwlock.
> > This thread is also waiting on a turnstile, but a different one,
> > it also did come here though a VOP_LOCK. This is also a turnstile for a
> > rwlock, which also has a owner, which also has VOP_LOCK in his stack
> > trace and is waiting on a turnstile. It's also a rwlock (I checked the
> > l_syncobj) but l_wchan is bogus: ffff800079ac402f, this is not a
> > valid krwlock_t* (and examining memory at this address doesn't look like
> > a valid krwlock_t value, and 'show lock' doens't know about it either). 
> > 
> > any idea where to go from here ?
> Are you running nullfs?  I suspect that with layerfs there are lock leaks
> and conditions where it continues to use another vnode's lock when that
> vnode has been freed or recycled and gained another identity.  Yet another
> reason take vnode locking to the dump, cue objection from Holland.

No nullfs. Just plain ffs+wapbl.

Below is what I could get on a test server. I could reproduce this using
2 rsync processes with
rsync -avH --delete --delete-excluded --delete-after --delay-updates --force 
--stats --partial
I ran both against rsync:// to which I have
direct connectivity, one rsync against
NetBSD/NetBSD-release-4-0/src or NetBSD/NetBSD-release-3-1/src
(switch between the two, so that it has something to update)
and one against
It may require runinng rsync against an exising tree to update it,
instead of in a empty directory.

This box is still in ddb, tell me if there's something more I can get out
of it.

Manuel Bouyer <>
     NetBSD: 26 ans d'experience feront toujours la difference
horn:/home/bouyer>ps axl -O laddr | grep tile
20331  884 ce925a80  145 15872 117  0  9060  6584 tstile  D+   ttyp3  1:27.46 r
20331 1379 ce8ff540  884 25361 117  0  9060  5756 tstile  D+   ttyp3  3:44.48 r
20331 1046 ce925080  608 18917 113  0 10084  7572 tstile  D+   ttyp4  5:03.82 r
20331  546 ce923a60  398     0 117  0  2144  1904 tstile  Ds+  ttyp5  0:00.09 -

ce925a80 (wchan_t) 0xdde954f0 rw_owner 0xce8ff54f
ce8ff540 (wchan_t) 0xd6f6c898 rw_owner 0xce925a8f
ce925080 (wchan_t) 0xce818028 mutex_owner
ce923a60 (wchan_t) 0xd6f6c898 rw_owner 0xce925a8f

db{0}> tr/a ce925a80
trace: pid 884 lid 1 at 0xce98182c
sleepq_block(0,0,c062653c,c06822dc,dde954f0,c06eb500,cbf48664,71,40,1000001) at 
 at netbsd:turnstile_block+0x261
at netbsd:rw_vector_enter+0x2a1
 at netbsd:vlockmgr+0x126
 at netbsd:ffs_lock+0x41
at netbsd:VOP_LOCK+0x60
vn_lock(dde95450,2,c031a446,0,8,c04f8fc0,cd57fa40,ce981c14,d6f6c7f8,0) at 
 at netbsd:cache_lookup+0x1f5
 at netbsd:ufs_lookup+0xcc
 at netbsd:VOP_LOOKUP+0x8c
lookup(ce981c00,20002,400,ce981c1c,1,d6ecaf20,1,c03702bd,ce981c1c,ce981b9f) at 
namei(ce981c00,ce981c70,ce981c0c,c02e63f0,cbf46600,ffffffff,0,80996e0,0,0) at 
do_sys_stat(80996e0,0,ce981c70,c02fcac2,c06eb442,c02eabd6,0,120d,81a4,12c74) at 
 at netbsd:sys___lstat30+0x29
syscall(ce981d48,b3,ab,1f,1f,80996e0,bfbfc94c,bfbfc038,80996e0,0) at 
db{0}> tr/a ce8ff540
trace: pid 1379 lid 1 at 0xce96288c
sleepq_block(0,0,c062653c,c06822dc,d6f6c898,c06eb3a0,cbf48698,71,40,1000001) at 
turnstile_block(0,1,d6f6c898,c06822dc,cbf46600,2,0,0,0,0) at 
rw_vector_enter(d6f6c898,1,0,6,d6f6c7f8,2,ce9629cc,c0294751,d6f6c898,2) at 
 at netbsd:vlockmgr+0x126
 at netbsd:ffs_lock+0x41
 at netbsd:VOP_LOCK+0x60
vn_lock(d6f6c7f8,20002,cbf46600,8000,101c898,1ff,1ff,0,200,0) at 
at netbsd:wapbl_ufs_rename+0x61b
 at netbsd:ufs_rename+0x30
 at netbsd:VOP_RENAME+0x7c
 at netbsd:do_sys_rename+0x3c5
 at netbsd:sys_rename+0x26
at netbsd:syscall+0xc8
db{0}> whatis 0xdde954f0
0xdde954f0 is 0xdde95000+1264 in POOL 'kvakernel' (allocated)
0xdde954f0 is 0xdde95450+160 in POOL 'vnodepl' (allocated)
0xdde954f0 is 0xddda0000+1004784 from VMMAP 0xc06d8bc0
db{0}> whatis 0xd6f6c898
0xd6f6c898 is 0xd6f6c000+2200 in POOL 'kvakernel' (allocated)
0xd6f6c898 is 0xd6f6c7f8+160 in POOL 'vnodepl' (allocated)
0xd6f6c898 is 0xd6940000+6473880 from VMMAP 0xc06d8bc0
db{0}> show lock 0xdde954f0
lock address : 0x00000000dde954f0 type     :     sleep/adaptive
initialized  : 0x00000000c036f560
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  0 last held:                  0
current lwp  : 0x00000000cbf47c80 last held: 0x00000000ce8ff540
last locked  : 0x00000000c036d237 unlocked : 0x00000000c036d2d8
owner/count  : 0x00000000ce8ff540 flags    : 0x000000000000000f

Turnstile chain at 0xc06eb500.
=> Turnstile at 0xcbf4864c (wrq=0xcbf4865c, rdq=0xcbf48664).
=> 0 waiting readers:
=> 1 waiting writers: 0xce925a80
db{0}> show lock 0xd6f6c898
lock address : 0x00000000d6f6c898 type     :     sleep/adaptive
initialized  : 0x00000000c036f560
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  2
current cpu  :                  0 last held:                  1
current lwp  : 0x00000000cbf47c80 last held: 0x00000000ce925a80
last locked  : 0x00000000c036d237 unlocked : 0x00000000c036d2d8
owner/count  : 0x00000000ce925a80 flags    : 0x000000000000000f

Turnstile chain at 0xc06eb3a0.
=> Turnstile at 0xcbf48680 (wrq=0xcbf48690, rdq=0xcbf48698).
=> 0 waiting readers:
=> 2 waiting writers: 0xce923a60 0xce8ff540

Home | Main Index | Thread Index | Old Index