kern/47514: Multiple dump -X triggers kernel panic in fss_ioctl

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/47514: Multiple dump -X triggers kernel panic in fss_ioctl
From: tsugutomo.enami%jp.sony.com@localhost
Date: Wed, 30 Jan 2013 03:00:01 +0000 (UTC)

>Number:         47514
>Category:       kern
>Synopsis:       Multiple dump -X triggers kernel panic in fss_ioctl
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 30 03:00:01 +0000 2013
>Originator:     enami tsugutomo
>Release:        NetBSD 6.0_STABLE
>Organization:
>Environment:
System: NetBSD rplaca.sm.sony.co.jp 6.0_STABLE NetBSD 6.0_STABLE (GENERIC) #2: 
Mon Jan 7 16:53:59 JST 2013 
enami%sigfpe.sm.sony.co.jp@localhost:/home/enami/src/netbsd-6/obj.amd64/sys/arch/amd64/compile/GENERIC
 amd64
Architecture: x86_64
Machine: amd64
>Description:

Recently, I've updated amanda in pkgsrc (from few years old one)
and kernel starts to panic since then.  It looks like the amanda
in pkgsrc is added facility to use dump -X if possilble on last
summer.

Here is the panic message and stacktrace (copied by hand):

uvm_fault(0xfffffe80bda3bd40, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff804bf1af cs 8 rflags 10283 cr2 8 cpl 0 rsp 
fffffe8006d59820
kernel: page fault trap, code=0
Stopped in pid 1713.1 (dump) at netbsd:mutex_vector_enter+0x80: movq 18(%r15), 
%rax
db{0}> bt
mutex_vector_enter() at netbsd:mutex_vector_enter+0x80
fss_ioctl() at netbsd:fss_ioctl+0xed
VOP_IOCTL() at netbsd:VOP_IOCTL+0x3b
vn_ioctl() at netbsd:vn_ioctl+0x76
sys_ioctl() at netbsd:sys_ioctl+0x13c
syscall() at netbsd:syscall+0xc4
db{0}>

The value of %r15 is fffffffffffffff0

With my amanda configuration, up to 8 dump will runs in parallel.
The system has two cpus.

>How-To-Repeat:

Install amanda from pkgsrc and setup to run multiple dumps in parallel.

>Fix:

I guess there is race condition between fss_open and fss_close.
Here is possible story:

    A process calls fss_open while another process is calling
    fss_close (since the device driver is marked as MPSAFE).  In
    the fss_close, no lock is held if control is between
    mutex_exit(&sc->slock) and fss_ioctl(dev, FSSIOCCLR...) for
    example.  So, fss_open may return successfully during that.
    Then the fss_close will detatch the device, before the
    process which opened the fss device issues FSSIOCSET ioctl
    (mutexes are destroyed and softc is freed as a result).
    Later, the ioctl will be issued and it raises kernel panic.

The value of %r15 may indicate destroyed mutex.

Prev by Date: Re: kern/47512: netbsd-6 system crash
Next by Date: Re: install/47513: HP DC5850 can't install - can't write MBR
Previous by Thread: install/47513: HP DC5850 can't install - can't write MBR
Next by Thread: Re: kern/47514: Multiple dump -X triggers kernel panic in fss_ioctl
Indexes:

Home | Main Index | Thread Index | Old Index