tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: raidframe oddity, take three (kernel panic!)
Hello. What version of NetBSD-4 are you running? Are yu runing the
straight sources as of 4.0-release? If so, then raidframe won't work with
such large disks. You need the 4.0-stable release after June 1, 2008 or
so. We ran into similar problems unning with large disks with the
4.0-release tree.
You want rf_reconstruct.c 1.95.0.2 or later.
-Brian
On Dec 6, 1:53pm, der Mouse wrote:
} Subject: raidframe oddity, take three (kernel panic!)
} >>>> [RAIDframe parity rebuild finished at 87%]
} >>> Any ideas [...]
} >> [...]
} > But never mind; I ran raidctl -s on it and saw something unexpected.
} > It turns out one of the underlying disks threw an I/O error, [...]
}
} Now, I'm having trouble rebuilding. (Context: 4.0 i386; the disks are
} 931G - 1 disk-maker TB - SATA, on a 12-port twe configured JBOD.)
}
} I did some tests, which convinced me the underlying disk had problems.
} The RAID 5 was built atop nine RAID1s each with only one member.
}
} So I replaced the failed drive and reconfigured the corresponding
} RAID1. raid0, of course, still thinks raid10e is sick and is running
} degraded. So I did "raidctl -R /dev/raid10e raid0" and boom!
}
} raid0: initiating in-place reconstruction on column 6
} panic: malloc: out of space in kmem_map
}
} This has now happened four times, each time instantly upon my running
} raidctl -R, so I am convinced there is a direct causal relation between
} the reconstruction start and the panic. The stack trace according to
} ddb (scribbled down, not cut-and-pasted) goes panic, free,
} rf_MakeReconMap, rf_MakeReconControl, rf_ContinueReconstructFailedDisk,
} rf_ReconstructInPlace, rf_ReconstructInPlaceThread.
}
} Obviously, RAID is of minimal value if it's impossible to reconstruct
} onto a failed-and-replaced drive. Is this a bug, is it a kernel config
} parameter I need to tweak, is it just effectively impossible to use
} RAIDframe RAID5 for a RAID this big (7.27+TB) with as little RAM as the
} machine has (1G), should I lose the "RAID 5 atop RAID 1" trick, what?
}
} I have the kernel coredump and the corresponding kernel for the last
} crash; the kernel was configured with `makeoptions DEBUG="-g"', so it
} has debugging symbols, and I've saved the netbsd.gdb that goes with
} that coredump.
}
} raidctl -G on the RAID5 says
}
} # raidctl config file for /dev/rraid0d
}
} START array
} # numRow numCol numSpare
} 1 9 0
}
} START disks
} /dev/raid4e
} /dev/raid5e
} /dev/raid6e
} /dev/raid7e
} /dev/raid8e
} /dev/raid9e
} /dev/raid10e
} /dev/raid11e
} /dev/raid12e
}
} START layout
} # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
} 16 1 1 5
}
} START queue
} fifo 100
}
} All the disklabels in question (on the drives and on raid4 through
} raid12) have just one partition, of type RAID, which covers all but a
} tiny sliver at the beginning of the disk. (On the real disks, the
} offset is 128 sectors; on raid4 through raid12, it's 64 sectors.)
}
} Since the RAID does not yet have any real data in it, I'm just wiping
} it and doing a parity reinit (raidctl -i), on the assumption that the
} problem with raidctl -R is something relatively easy to fix. (And even
} if it's not, starting a parity reinit doesn't _hurt_ anything.)
}
} /~\ The ASCII Mouse
} \ / Ribbon Campaign
} X Against HTML mouse%rodents-montreal.org@localhost
} / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from der Mouse
Home |
Main Index |
Thread Index |
Old Index