NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/41797: kernel panic in kern_physio when tape reaches EOM during write if DIAGNOSTICS is enbled, without DIGNOSTICS error status is lost
>Number: 41797
>Category: kern
>Synopsis: kernel panic in kern_physio when tape reaches EOM during write
>if DIAGNOSTICS is enbled, without DIGNOSTICS error status is lost
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 29 17:45:00 +0000 2009
>Originator: Wolfgang Stukenbrock
>Release: NetBSD 4.0
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD s012 4.0 NetBSD 4.0 (NSW-S012) #9: Fri Mar 13 12:31:52 CET 2009
wgstuken@s012:/usr/src/sys/arch/amd64/compile/NSW-S012 amd64
Architecture: x86_64
Machine: amd64
>Description:
We have a VXA320 Tape connected to an adaptec 29160 controler at this
system.
For debugging purpose we run a kernel with DIAGNOSTICS enabled.
"Sometimes" the systems panics with an asserstion in kern_phyio.c in
line 201 "KASSERT((bp->b_flags & B_ERROR) == 0);".
This is only enabled when the kernel is compiled with DIAGNOSTICS - so
most user will never get the panic.
(but first EOM error status is lost ... - see analyses below)
It took some time to find out, that the cause of it is the st-driver.
We are using the nrst-devices - so no fixed block mode - and the
default behaviour ot theese is EEW disabled.
Now the following happens when EOM is reached on the tape:
The XS-command is returned with XS_SENCE from the ahc driver. The
transfer count is equal to the number of bytes requested.
The st-driver detects that EOM is the cause for the problem. Due to the
fact that no EEW is enabled it returns EIO.
The st-drive is called again to finish the packet with EIO indicated.
It set B_ERROR in the buffer.
The physio-done routine now checks if all bytes have been transfered -
and it is (!) - so it reaches the assertion above -> panic.
If EEW is enabled on the tape, the st-driver returns 0 (no-error) after
detecting EOM and no problem occurs.
I'm not realy confirmed with the return semantics of the HW-controlers.
The code seems to ecpect, that the failed command is
returned and the sence-info has to be requested.
In the case above, the ahc driver already returns the sence
information. The driver seems to be able to handle this too.
I don't know it it is a legal situation, that all bytes have been
written by the tape, but EOM is signaled anyway.
Perhaps this is a special case of the VXA-tape drive.
netherless: The code in kern_phsyio.c physio_done() looks wrong to me,
because it does not update the error status in mbp if all bytes
have been transfered but B_ERROR has been set too. This looses the
error information and no error is reported to user level
as it should be.
In fact without DIAGNOSTICS in kernel-config, the first EOM-hit by a
write is not returned to user level! I've tested it.
I think the way to fix this, is to check the B_ERROR flag in
phsyio_done() too and enter error processing if either not all
requested bytes have been transfered or an error status is set.
remark: this leads to another bug in phsyio() some lines below .. the
check with delta must allow 0 too. We must allow an error
even if all requested data has been transferd .....
>How-To-Repeat:
Setup a kernel with DIGNOSTIC, connect an SCSI-Tape to it and fill up
the tape till it hits EOM.
The system will panic there ...
>Fix:
The following fix need to be applyedto sys/kern/kern_physio.c.
With this fix the system no longer panic an the error is returend to
user-level on first EOM detection.
--- kern_physio.c 2009/07/29 13:56:10 1.1
+++ kern_physio.c 2009/07/29 17:39:11
@@ -158,7 +158,7 @@
uvm_vsunlock(bp->b_proc->p_vmspace, bp->b_data, todo);
simple_lock(&mbp->b_interlock);
- if (__predict_false(done != todo)) {
+ if (__predict_false(done != todo || (bp->b_flags & B_ERROR) == 0)) {
off_t endoffset = dbtob(bp->b_blkno) + done;
/*
@@ -197,8 +197,6 @@
mbp->b_error = error;
}
mbp->b_flags |= B_ERROR;
- } else {
- KASSERT((bp->b_flags & B_ERROR) == 0);
}
mbp->b_running--;
@@ -438,7 +436,7 @@
off_t delta;
delta = uio->uio_offset - mbp->b_endoffset;
- KASSERT(delta > 0);
+ KASSERT(delta >= 0);
uio->uio_resid += delta;
/* uio->uio_offset = mbp->b_endoffset; */
} else {
>Unformatted:
Home |
Main Index |
Thread Index |
Old Index