NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/42228: Kernel deadlock prevents further file system access

>Number:         42228
>Category:       kern
>Synopsis:       Kernel deadlock prevents further file system access
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 25 12:10:00 +0000 2009
>Originator:     Matthias Scheler
>Release:        NetBSD 5.0_STABLE sources from 2009-10-11
Matthias Scheler                        
System: NetBSD 5.0_STABLE NetBSD 5.0_STABLE (COLWYN.64) 
#1: Sun Oct 11 21:22:04 BST 2009 amd64
Architecture: x86_64
Machine: amd64
My NetBSD/amd64 5.0/5.0_STABLE server has locked up at multiple occasions.
The symptoms are always identical:
1.) The system still responds to ICMP Echo packets.
2.) Services that don't access the file system (frequently) like e.g.
    BIND still work.
3.) Any other processes (Postfix, Apache, inetd, etc.) are stuck.
I have to recover the machine by dropping into "ddb" and using "sync"
or "reboot" to force a restart.

The system managed to write a crash dump (to "raid0b" in case it
matters) after several of the crashes but "savecore" always claims
there is no crash dump once the system is back up. The only debug
information I therefore have is the stack trace of multiple of
the hung process after the latest incident. The stack traces all
look like this:


The function after "namei" differs from case to case. Examples are
vn_open() or do_sys_stat().

If someone sends me instructions for producing more useful debugging
I will of course provide it.

I'm not exactly sure what triggers it. Contributing factors might be:
- "/tmp" on tmpfs
- amd(8)
  I use amd(8) for managing "/home" and three more top level directory
  hierarchies. The configuration looks like this:

        auto_attrcache  = 1
        search_path     = /etc/amd
        unmount_on_exit = yes

        [ /home ]
        map_name =      amd.home
        map_type =      file

        [ /share ]
        map_name =      amd.share
        map_type =      file

        [ /scratch ]
        map_name =      amd.scratch
        map_type =      file

        [ /volumes ]
        map_name =      volumes
        map_type =      file
- rTorrent
  rTorrent mmap()-based I/O handling caused problems with WAPBL in the
  past, mostly when rTorrent was downloading files. During most of my
  lock ups it was however only seeding.
  That is unlikely because reducing the number of file system which use
  WAPBL didn't help. The only remaining file systems with WAPBL turned
  on should have been idle during the last lock up.

, running rTorrent and/or WAPBL might
be one of the contributing factors. Using amd(8) could also be a reason.

Not known.

Home | Main Index | Thread Index | Old Index