Subject: Re: RaidFrame Partitioning
To: Greg Oster <oster@cs.usask.ca>
From: Louis Guillaume <lguillaume@berklee.edu>
List: netbsd-users
Date: 02/13/2004 17:15:06
Greg Oster wrote:

> Neil Booth writes:
> 
>>Louis  Guillaume wrote:-
>>
>>
>>>Also I've noticed some filesystem corruption popping up sporadically on 
>>>the root filesystem such as...
>>>
>>>find: /usr/share/man/cat3/getnetgrent.0: Bad file descriptor
>>>
>>>... in my daily insecurity output. I've only ever seen this with 
>>>RaidFrame. It has been happening for some time in small, subtle and 
>>
>>I had this with Raidframe raid 1 too; it was 100% reproducible doing
>>a CVS checkout of /src.  I eventually gave up after two extra
>>installs, and just use the drives in the standard way with no
>>problems.
> 
> 
> Do you still have your config files, dmesg output, disklabels, etc, 
> from this?  I'd be curious to see them....  
> 
> Thanks.
> 
> Later...
> 
> Greg Oster
> 

One thing I noticed was that this only really happens after transferring 
large amounts of data to the disk. For example Neil's CVS update or my 
dump/restore to get the files onto the raid sets.

After I re-partitioned and updated the raid filesystems, I noticed that 
the problem had gotten worse - horribly worse. Many (random) apps would 
core-dump and many services died during rc due to library problems.

In a fit of panic, I mounted a cd image of 1.6ZI and re-extracted all 
the sets one by one (rebooted after just the kernel.)

Now the machine's been up 2 days without a hitch. As soon as something 
happens, I'll let you know.

Also - in case it matters, this is a MP machine...

NetBSD 1.6ZI (GENERIC.MP) #23: Sun Feb  1 00:16:58 UTC 2004
 
louis@creator.berklee.net:/usr/obj/sys/arch/i386/compile.i386/GENERIC.MP
total memory = 255 MB
avail memory = 242 MB
BIOS32 rev. 0 found at 0xfdba0
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (AMI      CNB30LE     )
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Pentium III (686-class), 996.90 MHz, id 0x68a
cpu0: features 387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 387fbff<PGE,MCA,CMOV,PAT,PSE36,PN,MMX>
cpu0: features 387fbff<FXSR,SSE>
cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 4-way
cpu0: L2 cache 256 KB 32b/line 8-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: serial number 0000-068A-0002-67F6-059B-2626
cpu0: calibrating local timer
cpu0: apic clock running at 132 MHz
cpu0: 8 page colors

...etc.

Louis