NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/45093: kernel deadlock between TCP and UVM involving callouts



>Number:         45093
>Category:       kern
>Synopsis:       kernel deadlock between TCP and UVM involving callouts
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 21 16:45:00 +0000 2011
>Originator:     Manuel Bouyer
>Release:        NetBSD 5.1
>Organization:
>Environment:
System: NetBSD armandeche.soc.lip6.fr 5.1 NetBSD 5.1 (GENERIC) #0: Sun Nov 7 
14:39:56 UTC 2010 
builds%b6.netbsd.org@localhost:/home/builds/ab/netbsd-5-1-RELEASE/i386/201011061943Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/i386/compile/GENERIC
 i386
Architecture: i386
Machine: i386
>Description:
        A deadlock condition exists in the NFS server easy to reproduce
        on my server here:
        The NFS server closing a socket will call uvm_unloanpage()
        (trough soclose->sodisconnect->sodopendfree->sodopendfreel)
        with softnet_lock held. uvm_unloanpage() can then kpause();
        if while the nfsd's thread is paused a network callout fires
        (e.g. TCP timers), it will block trying to get softnet_lock,
        and the softclock thread will go to sleep. The effect is that
        the kpause will not be woken up so we have a deadlock:
        the softclock thread waits for softnet_lock, and the thread holding
        the softnet_lock waits to be worken up by the softclock thread.

        More details and stack trace in
        http://mail-index.netbsd.org/tech-kern/2011/06/17/msg010734.html


>How-To-Repeat:
        have a NFS server with some NFS activity, some local activity
        (so there is contention on vnode locks and uvm_unloanpage will
        have to sleep) and enough network activity to have TCP callouts
        pending.
>Fix:
        workaround: either disable page loaning in nfs server, or
        change uvm_unloanpage() to use yield() instead of kpause()
        (the later has been confirmed to work around the issue).

        A longer term is to avoid long-sleeping threads with softnet_lock.
        For this specific case; maybe sodopendfree can be transfered to
        another thread; or the socket's lock (which is softnet_lock for
        TCP sockets) can be droped before calling sodopendfree and re-locked
        after.

        sokva_reclaim_callback() and sokvareserve() may have the same issue,
        if called with the socket locked.



Home | Main Index | Thread Index | Old Index