NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/41797: kernel panic in kern_physio when tape reaches EOM during write if DIAGNOSTICS is enbled, without DIGNOSTICS error status is lost



>Number:         41797
>Category:       kern
>Synopsis:       kernel panic in kern_physio when tape reaches EOM during write 
>if DIAGNOSTICS is enbled, without DIGNOSTICS error status is lost
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 29 17:45:00 +0000 2009
>Originator:     Wolfgang Stukenbrock
>Release:        NetBSD 4.0
>Organization:
Dr. Nagler & Company GmbH
        
>Environment:
        
        
System: NetBSD s012 4.0 NetBSD 4.0 (NSW-S012) #9: Fri Mar 13 12:31:52 CET 2009 
wgstuken@s012:/usr/src/sys/arch/amd64/compile/NSW-S012 amd64
Architecture: x86_64
Machine: amd64
>Description:
        We have a VXA320 Tape connected to an adaptec 29160 controler at this 
system.
        For debugging purpose we run a kernel with DIAGNOSTICS enabled.
        "Sometimes" the systems panics with an asserstion in kern_phyio.c in 
line 201 "KASSERT((bp->b_flags & B_ERROR) == 0);".
        This is only enabled when the kernel is compiled with DIAGNOSTICS - so 
most user will never get the panic.
        (but first EOM error status is lost ... - see analyses below)
        It took some time to find out, that the cause of it is the st-driver.
        We are using the nrst-devices - so no fixed block mode - and the 
default behaviour ot theese is EEW disabled.

        Now the following happens when EOM is reached on the tape:
        The XS-command is returned with XS_SENCE from the ahc driver. The 
transfer count is equal to the number of bytes requested.
        The st-driver detects that EOM is the cause for the problem. Due to the 
fact that no EEW is enabled it returns EIO.
        The st-drive is called again to finish the packet with EIO indicated. 
It set B_ERROR in the buffer.
        The physio-done routine now checks if all bytes have been transfered - 
and it is (!) - so it reaches the assertion above -> panic.

        If EEW is enabled on the tape, the st-driver returns 0 (no-error) after 
detecting EOM and no problem occurs.

        I'm not realy confirmed with the return semantics of the HW-controlers. 
The code seems to ecpect, that the failed command is
        returned and the sence-info has to be requested.
        In the case above, the ahc driver already returns the sence 
information. The driver seems to be able to handle this too.

        I don't know it it is a legal situation, that all bytes have been 
written by the tape, but EOM is signaled anyway.
        Perhaps this is a special case of the VXA-tape drive.

        netherless: The code in kern_phsyio.c physio_done() looks wrong to me, 
because it does not update the error status in mbp if all bytes
        have been transfered but B_ERROR has been set too. This looses the 
error information and no error is reported to user level
        as it should be.
        In fact without DIAGNOSTICS in kernel-config, the first EOM-hit by a 
write is not returned to user level! I've tested it.

        I think the way to fix this, is to check the B_ERROR flag in 
phsyio_done() too and enter error processing if either not all
        requested bytes have been transfered or an error status is set.
        remark: this leads to another bug in phsyio() some lines below .. the 
check with delta must allow 0 too. We must allow an error
        even if all requested data has been transferd .....
>How-To-Repeat:
        Setup a kernel with DIGNOSTIC, connect an SCSI-Tape to it and fill up 
the tape till it hits EOM.
        The system will panic there ...
>Fix:
        The following fix need to be applyedto sys/kern/kern_physio.c.
        With this fix the system no longer panic an the error is returend to 
user-level on first EOM detection.

--- kern_physio.c       2009/07/29 13:56:10     1.1
+++ kern_physio.c       2009/07/29 17:39:11
@@ -158,7 +158,7 @@
        uvm_vsunlock(bp->b_proc->p_vmspace, bp->b_data, todo);

        simple_lock(&mbp->b_interlock);
-       if (__predict_false(done != todo)) {
+       if (__predict_false(done != todo || (bp->b_flags & B_ERROR) == 0)) {
                off_t endoffset = dbtob(bp->b_blkno) + done;

                /*
@@ -197,8 +197,6 @@
                        mbp->b_error = error;
                }
                mbp->b_flags |= B_ERROR;
-       } else {
-               KASSERT((bp->b_flags & B_ERROR) == 0);
        }

        mbp->b_running--;
@@ -438,7 +436,7 @@
                off_t delta;

                delta = uio->uio_offset - mbp->b_endoffset;
-               KASSERT(delta > 0);
+               KASSERT(delta >= 0);
                uio->uio_resid += delta;
                /* uio->uio_offset = mbp->b_endoffset; */
        } else {

>Unformatted:
        
        


Home | Main Index | Thread Index | Old Index