kern/42228: Kernel deadlock prevents further file system access

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/42228: Kernel deadlock prevents further file system access
From: tron%zhadum.org.uk@localhost
Date: Sun, 25 Oct 2009 12:10:01 +0000 (UTC)

>Number:         42228
>Category:       kern
>Synopsis:       Kernel deadlock prevents further file system access
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 25 12:10:00 +0000 2009
>Originator:     Matthias Scheler
>Release:        NetBSD 5.0_STABLE sources from 2009-10-11
>Organization:
Matthias Scheler                                  http://zhadum.org.uk/
>Environment:
System: NetBSD colwyn.zhadum.org.uk 5.0_STABLE NetBSD 5.0_STABLE (COLWYN.64) 
#1: Sun Oct 11 21:22:04 BST 2009 
tron%colwyn.zhadum.org.uk@localhost:/src/sys/compile/COLWYN.64 amd64
Architecture: x86_64
Machine: amd64
>Description:
My NetBSD/amd64 5.0/5.0_STABLE server has locked up at multiple occasions.
The symptoms are always identical:
1.) The system still responds to ICMP Echo packets.
2.) Services that don't access the file system (frequently) like e.g.
    BIND still work.
3.) Any other processes (Postfix, Apache, inetd, etc.) are stuck.
I have to recover the machine by dropping into "ddb" and using "sync"
or "reboot" to force a restart.

The system managed to write a crash dump (to "raid0b" in case it
matters) after several of the crashes but "savecore" always claims
there is no crash dump once the system is back up. The only debug
information I therefore have is the stack trace of multiple of
the hung process after the latest incident. The stack traces all
look like this:

sleepq_block
turnstile_block
rw_vector_enter
vlockmgr
VOP_LOCK
vn_lock
namei

The function after "namei" differs from case to case. Examples are
vn_open() or do_sys_stat().

If someone sends me instructions for producing more useful debugging
I will of course provide it.

>How-To-Repeat:
I'm not exactly sure what triggers it. Contributing factors might be:
- "/tmp" on tmpfs
- amd(8)
  I use amd(8) for managing "/home" and three more top level directory
  hierarchies. The configuration looks like this:

        [global]
        auto_attrcache  = 1
        search_path     = /etc/amd
        unmount_on_exit = yes

        [ /home ]
        map_name =      amd.home
        map_type =      file

        [ /share ]
        map_name =      amd.share
        map_type =      file

        [ /scratch ]
        map_name =      amd.scratch
        map_type =      file

        [ /volumes ]
        map_name =      volumes
        map_type =      file
- rTorrent
  rTorrent mmap()-based I/O handling caused problems with WAPBL in the
  past, mostly when rTorrent was downloading files. During most of my
  lock ups it was however only seeding.
- WAPBL
  That is unlikely because reducing the number of file system which use
  WAPBL didn't help. The only remaining file systems with WAPBL turned
  on should have been idle during the last lock up.

, running rTorrent and/or WAPBL might
be one of the contributing factors. Using amd(8) could also be a reason.

>Fix:
Not known.

Follow-Ups:
- Re: kern/42228: Kernel deadlock prevents further file system access
  - From: Manuel Bouyer

Prev by Date: misc/42227: netbsd-3 build broken.
Next by Date: Re: kern/42228: Kernel deadlock prevents further file system access
Previous by Thread: misc/42227: netbsd-3 build broken.
Next by Thread: Re: kern/42228: Kernel deadlock prevents further file system access
Indexes:

Home | Main Index | Thread Index | Old Index