tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Problem with raidframe: bad disk or bad memory, how to tell?



        Hello.  Well, thanks for the tip.  Perhaps this sheds some light on
the problem.  If I run fsck on wd0, the disk attached to channel 0, it
reads good.  If I do a similar check on wd1, the disk attached to channel 1,
all reads good as long as there's no activity on wd0.  However, if there's
activity on both disks simultaneously, wd1 will give corrupt data.
Is this a sign that the ATA controller is toast?  This isn't a new machine,
and it has been running in production for a good while.
Here's how the ATA buses and disks probe:

NetBSD 3.1_STABLE (ASTERISK_SEROTEK) #0: Wed Dec 12 23:46:19 PST 2007
        
buhrow%lothlorien.nfbcal.org@localhost:/usr/src/sys/arch/i386/compile/ASTERISK_SEROTEK
total memory = 2030 MB
avail memory = 1980 MB
BIOS32 rev. 0 found at 0xf0010
SMBIOS rev. 2.3 @ 0xfbbd0 (75 entries)
PCI BIOS rev. 2.1 found at 0xf0031
PCI IRQ Routing Table rev. 1.0 found at 0xf3d20, size 224 bytes (12 entries)
PCI Interrupt Router at 000:31:0 (Intel product 0x8086 compatible)
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel Pentium 4 (686-class), 2992.80 MHz, id 0xf29
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 4400<CID,xTPR>
cpu0: "Intel(R) Pentium(R) 4 CPU 3.00GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: using thermal monitor 1
cpu0: 16 page colors
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok

[...]

piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at irq 14
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at irq 15
atabus1 at piixide0 channel 1
piixide1 at pci0 dev 31 function 2
piixide1: Intel 82801EB Serial ATA Controller (rev. 0x02)
piixide1: bus-master DMA support present
piixide1: primary channel configured to native-PCI mode
piixide1: using irq 10 for native-PCI interrupt
atabus2 at piixide1 channel 0
piixide1: secondary channel configured to native-PCI mode
atabus3 at piixide1 channel 1

[...]

atapibus0 at atabus0: 2 targets
cd1 at atapibus0 drive 1: <TOSHIBA CD-ROM XM-6502B, , 1013> cdrom removable
cd1: 32-bit data port
cd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
wd0 at atabus0 drive 0: <WDC WD2500JB-00FUA0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
cd1(piixide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
wd1 at atabus1 drive 0: <WDC WD2500JB-00GVA0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)

-thanks
-Brian

On Jul 10, 11:57pm, Manuel Bouyer wrote:
} Subject: Re: Problem with raidframe: bad disk or bad memory, how to tell?
} On Thu, Jul 10, 2008 at 02:24:01PM -0700, Brian Buhrow wrote:
} >     Well.  Checking each partition individually  adds mystery to the
} > problem.  Each of the disks reports no errors, and the fsck'ing of each
} > disk individually looks clean.
} 
} Did you try running 2 fsck at the same time, on on each disk ?
} This is to make sure the issue doesn't show up only when both disks
} are active at the same time.
} 
} > Yet, running fsck on /dev/rraid0a yields
} > similar results to the ones I already posted.  My guess is that I'm seeing
} > memory corruption.  This makes me wonder  if I need to reboot the machine
} > without syncing the disks first.
} 
} If there's damage to be done, it's probably already done.
} 
} > Perhaps I need to install memtest86 to
} > see if I can exercise the memory and find out where the problem is.
} 
} Would be a good idea. 
} 
} -- 
} Manuel Bouyer <bouyer%antioche.eu.org@localhost>
}      NetBSD: 26 ans d'experience feront toujours la difference
} --
>-- End of excerpt from Manuel Bouyer




Home | Main Index | Thread Index | Old Index