Subject: Re: raidframe problems (revisited)
To: Louis Guillaume <lguillaume@berklee.edu>
From: Louis Guillaume <lguillaume@berklee.edu>
List: netbsd-users
Date: 03/10/2007 18:37:15
Louis Guillaume wrote:
> David Brownlee wrote:
>> On Tue, 13 Feb 2007, Volkmar Seifert wrote:
>>
>>>> First let me say that I have had excellent experiences with raidframe.
>>>> Every one of these good experiences has involved using SCSI disks.
>>>>
>>>> Every time I've used raidframe with some form of ATA disk there has been
>>>> trouble. And now I've had another such experience to report.
>>> Well, I have just set up a machine running two ATA-Disks in RAID-1
>>> using RAIDFrame, and it
>>> "just worked(tm)", right out of the box, so generally speaking there
>>> should not be any
>>> problems, no matter what system is used, as long as each hardware
>>> component in itself is
>>> working fine with netbsd.
>>>
>>> Good luck, even if this mail probably was not of much help.
>>     Just to chime in on a datapoint - I have a dozen or so
>>     machines all running raidframe RAID1 on IDE and SATA disks,
>>     from ancient 10GB disks in a PIII/550 to WD raptors in
>>     AMD64 X2 rackmounts.
>>
>>     The only issue I have is if the powerfails the parity
>>     rebuild takes _forever_, which could probably be mitigated
>>     a little by raidframe splitting the parity clean flag into
>>     a flag per disk section.
>>
>>     The only NetBSD machines I have which do not run raidframe are
>>     laptops (single disk) and a couple of test machines I really
>>     do not care about.
> 
> 
> 
> Well all the evidence points to my bad hardware. Thank you all for your
> help. Looks like I'll have to go scrounge a new SATA controller!!
> 
> Louis

Ok - I changed my SATA controller to a...

satalink0 at pci0 dev 11 function 0
satalink0: Silicon Image SATALink 3512 (rev. 0x01)
satalink0: SATALink BA5 register space disabled
satalink0: bus-master DMA support present
satalink0: primary channel wired to native-PCI mode
satalink0: using irq 12 for native-PCI interrupt
atabus0 at satalink0 channel 0
satalink0: secondary channel wired to native-PCI mode
atabus1 at satalink0 channel 1
satalink0: port 0: device present, speed: 1.5Gb/s

... which is listed on the Supported Hardware page. But problems still
persist when I use raidframe.

I created normal filesystems on these disks and copied large amounts of
data to them several times and left them mounted and active for several
days with no trouble.

From the minute the parity is clean on a fresh new raid1, I make
filesystems, and move data on to them, then unmount and fsck. Problems
are found.

I even tried creating an LFS filesystem to mix it up. But that has a
most ungraceful end...

ino too large, reclen=0, reclen>space, or reclen&3!=0
dp->d_ino = 0x33706d2e  dp->d_reclen = 0x20
maxino = 0xfc6e spaceleft = 0x164
name size misstated
DIRECTORY CORRUPTED  I=426  OWNER=louis MODE=40755
SIZE=1024 MTIME=Feb 17 12:01 2007
Memory fault (core dumped)


Is there anything else I can do to pin down exactly where the problem lies?

Now I'm re-checking the non-raidframe behaviour again. Any advice would
be most appreciated. Thanks!

Louis