Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 5.99.42/i386 crash (backtrace + core available)

On Sat, Jan 08, 2011 at 11:16:19PM +0000, David Holland wrote:
> On Tue, Dec 28, 2010 at 05:37:43PM +0100, Dennis den Brok wrote:
>  > rw_abort()
>  > rw_vector_enter(df829668, ...)
>                    ^^^^^^^^
>  > genfs_lock()
>  > layer_bypass()
>  > VOP_LOCK(e15c2170,2)
>             ^^^^^^^^
>  > vclean()
>  > getcleanvnode()
>  > getnewvnode()
>  > ffs_vget()
>  > ufs_lookup()
>  > VOP_LOOKUP(df8295c8,...)
>               ^^^^^^^^
> Unfortunately most of the things visible in the stack trace are vnode
> op argument structures and not pointers to anything interesting.
> However, since rw_vector_enter is passed &vp->v_lock, I think we can
> tentatively conclude that it's trying to lock the same vnode that was
> passed to VOP_LOOKUP, and it's failing because that's quite properly
> already locked.
> It looks like what happened is that ffs went to get a fresh vnode and
> got a not-recently-used nullfs vnode. However, the nullfs vnode turned
> out to be the nullfs vnode sitting on top of the ffs vnode it was
> already working with. Since these share locks now, the vnode was
> locked even though not recently used (and on the list to be cleaned
> and all that), and in fact it turned out to be the same ffs vnode this
> process was already working on, so trying to lock it for cleaning blew
> up.
> So this seems like fallout from Juergen's layer locking cleanup from a
> few months ago. Not sure what the proper solution is, though.

While the analysis looks ok I don't think layer locking cleanup is the
reason.  Before the cleanup locks were shared so getcleanvnode() would
use the same lock without layer_bypass().

Dennis, to be sure you could build kernel/userland from somewhere in
september 2010 - this is after all my changes to vnode locking.

Juergen Hannken-Illjes - - TU Braunschweig 

Home | Main Index | Thread Index | Old Index