tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: tstile syndrome



On Thu, Aug 27, 2009 at 01:09:16PM +0200, Manuel Bouyer wrote:
> Hi,
> here's what I found so far on a server that show the tstile hang,
> with some ddb+gdb playing.
> 
> Most processes are waiting on a tunrstile (you did know that),
> the one I started with had more than 4000 writers in the queue.
> The threads did come here though a VOP_LOCK() (you did also know that).
> This is a tunrstile for a rwlock, I found the owner of this rwlock.
> This thread is also waiting on a turnstile, but a different one,
> it also did come here though a VOP_LOCK. This is also a turnstile for a
> rwlock, which also has a owner, which also has VOP_LOCK in his stack
> trace and is waiting on a turnstile. It's also a rwlock (I checked the
> l_syncobj) but l_wchan is bogus: ffff800079ac402f, this is not a
> valid krwlock_t* (and examining memory at this address doesn't look like
> a valid krwlock_t value, and 'show lock' doens't know about it either). 

I think I mixed up pointer and values at one point.
I got another instance of the tstile deadlock and I think I found the
cause:

ffff800079987800 wchan_t 0xffff80008e572958 syncobj 0xffffffff806cf280 rw
owner 0xffff8000d47a3baf

ffff8000d47a3ba0 wchan_t 0xffff80008f301290 syncobj 0xffffffff806cf280 rw
owner 0xffff80007998780f

So ffff800079987800 is waiting on a lock held by 0xffff8000d47a3ba0, and
ffff8000d47a3ba0 is waiting on a lock held by 0xffff800079987800.

here's the stack trace for both processes:
db{0}> tr/a ffff800079987800
trace: pid 21115 lid 1 at 0xffff80007931c710
sleepq_block() at netbsd:sleepq_block+0xec
turnstile_block() at netbsd:turnstile_block+0x29e
rw_vector_enter() at netbsd:rw_vector_enter+0x28c
vlockmgr() at netbsd:vlockmgr+0xf6
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd9
wapbl_ufs_rename() at netbsd:wapbl_ufs_rename+0x5ab
ufs_rename() at netbsd:ufs_rename+0x39
VOP_RENAME() at netbsd:VOP_RENAME+0x75
do_sys_rename() at netbsd:do_sys_rename+0x57d
syscall() at netbsd:syscall+0xb6      
db{0}> tr/a ffff8000d47a3ba0
trace: pid 25624 lid 1 at 0xffff8000d47cb650
sleepq_block() at netbsd:sleepq_block+0xec
turnstile_block() at netbsd:turnstile_block+0x29e
rw_vector_enter() at netbsd:rw_vector_enter+0x28c
vlockmgr() at netbsd:vlockmgr+0xf6    
VOP_LOCK() at netbsd:VOP_LOCK+0x64    
vn_lock() at netbsd:vn_lock+0xd9      
cache_lookup() at netbsd:cache_lookup+0x201
ufs_lookup() at netbsd:ufs_lookup+0xcd
VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x80
lookup() at netbsd:lookup+0x34b       
namei() at netbsd:namei+0x1a4
do_sys_stat() at netbsd:do_sys_stat+0x44
sys___lstat30() at netbsd:sys___lstat30+0x2a
syscall() at netbsd:syscall+0xb6      

Any idea on how to fix this ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index