Subject: kern/12310: raidframe + siop + raid1 can lose
To: None <gnats-bugs@gnats.netbsd.org>
From: Tim Rightnour <root@polaris.garbled.net>
List: netbsd-bugs
Date: 03/01/2001 17:11:43
>Number:         12310
>Category:       kern
>Synopsis:       raidframe + siop + raid1 can lose
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 01 16:02:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     Tim Rightnour
>Release:        -current as of 03/01/01<NetBSD-current source date>
>Organization:
	
>Environment:
	
1.5S alpha
>Description:
I attempted to set up a mirrored RAID volume on a pair of disks.  Using a
config file as follows, I attempted to initialize the parity:
START array
# numrow numcol numspare
1 2 0

START disks
/dev/sd0a
/dev/sd2a

START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
158 1 1 1

START queue
fifo 100

This spat out at me the following error:
siop0: unable to load data DMA map: 22
5 times for each disk in the raid.

Adding some debug printfs, and a Debugger call, the problem is that raidframe
is somehow telling the siop driver to set up a DMA map for something larger
than MAXPHYS.  Debugger traceback shown below:
siop0: unable to load data DMA map: 22
siop0: Hello: 13c00
Stopped in pid 142 (raid_parity) at     cpu_Debugger+0x4:       ret     zero,(ra
)
db> trace
cpu_Debugger() at cpu_Debugger+0x4
siop_scsicmd() at siop_scsicmd+0x3c4   
scsipi_execute_xs() at scsipi_execute_xs+0x5c
scsi_scsipi_cmd() at scsi_scsipi_cmd+0x1b8
scsipi_command() at scsipi_command+0xc0
sdstart() at sdstart+0x384
sdstrategy() at sdstrategy+0x220
spec_strategy() at spec_strategy+0x7c
VOP_STRATEGY() at VOP_STRATEGY+0x3c
rf_DispatchKernelIO() at rf_DispatchKernelIO+0x270
rf_DiskIOEnqueue() at rf_DiskIOEnqueue+0x2c4
rf_DiskReadFuncForThreads() at rf_DiskReadFuncForThreads+0x16c
FireNode() at FireNode+0x74
FireNodeList() at FireNodeList+0x224 
PropagateResults() at PropagateResults+0x670
ProcessNode() at ProcessNode+0x108
rf_FinishNode() at rf_FinishNode+0x28
rf_NullNodeFunc() at rf_NullNodeFunc+0x28
FireNode() at FireNode+0x74
FireNodeArray() at FireNodeArray+0x250
rf_DispatchDAG() at rf_DispatchDAG+0x14c  
rf_VerifyParityRAID1() at rf_VerifyParityRAID1+0x654
rf_VerifyParity() at rf_VerifyParity+0x8c
rf_RewriteParity() at rf_RewriteParity+0xe4
rf_RewriteParityThread() at rf_RewriteParityThread+0x54
esigcode() at esigcode
--- root of call graph ---

	
>How-To-Repeat:
Set up a raid1 with a stripe size greater than maxphys and attempt to
initialize.
	
>Fix:
Dunno.  But workaround is to pick a smaller stripe
	
>Release-Note:
>Audit-Trail:
>Unformatted: