Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOLVED?)
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: tech-kern
Date: 01/26/2007 17:03:09
Brian Buhrow writes:
> Hello. I'm happy to report that my md5 tests, which I've been running
> over the last 72 hours, have yielded no errors what soever. These have
> been using the pdcsata(4) driver, which seems to work fine.
> Glad you found the trouble. It's possible there's a bug in the
> hptide(4) driver, it's also possible that there's a bug in the specific
> revision of the hpt chipset you have in your card, wich the driver doesn't
> work around. Are there many folks using the hptide(4) driver, and, if so,
> what revisions of chips are they using it with?
I've been running disks on this:
hptide0 at pci0 dev 19 function 0
hptide0: Triones/Highpoint HPT370 IDE Controller
hptide0: bus-master DMA support present
hptide0: primary channel wired to native-PCI mode
hptide0: using irq 10 for native-PCI interrupt
for *ages* now. (I had problems with lockups early on before I upgraded
the bios on the thing... but since then it's been 100% solid and used
to drive disks under RAIDframe sets..)
Later...
Greg Oster
> On Jan 24, 7:35pm, Nino Dehne wrote:
> } Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOL
> } Hi there,
> }
> } first, I'm feeling really stupid and I'm terribly sorry to have caused
> } such an uproar. It appears that the issue _was_ hardware-based after all.
> } At least that's how things look currently. Let me explain:
> }
> } Before messing around further I wanted to try the setup in my desktop
> } box. So I swapped disks, using a different add-on controller than in
> } the server and also using different cables.
> }
> } The issue didn't show up. OK, a bit let down that the new server hardware
> } might be flaky and not knowing exactly which part of it, I tried running
> } the same setup in the desktop with the add-on controller from the server
> } (HPT371 single-channel). This brought back the dreaded no-panic-no-nothing-
> } lockups I had experienced in the server earlier already. Back then, I
> } used both the HPT and an additional SiI0680 cmdide(4) controller so that
> } all disks had their dedicated channel. Seeing those lockups on the desktop
> } now immediately raised a flag.
> }
> } It dawned on me that the cause of the lockups earlier might not have been
> } the cmdide(4) controller I ripped out but instead the hptide(4) one. The
> } cmdide(4) had other issues in the desktop box, though (lost interrupts).
> }
> } I swapped all disks back to the server and replaced the HPT with a Promise
> } Fasttrak100. And what can I say, 200 runs without a single error. I will
> } watch things closely but I'm confident.
> }
> } I still don't understand the symptoms fully, though.
> }
> } On Mon, Jan 22, 2007 at 08:45:19AM +1100, Daniel Carosone wrote:
> } > > As a wild guess, I resolved all IRQ conflicts on the machine.
> } > > [..]
> } > > Both steps helped nothing to resolve the issue.
> } >
> } > These were unlikely at this point, but thanks for going to the effort
> } > of eliminating them.
> }
> } As it turned out, nothing seems to be unlikely. :/ I would have never
> } expected the controller to be flaky either. Especially not when I do huge
> } transfers from a raw device without an error. Do you think there might
> } still be a bug in NetBSD, but instead of the FFS code it's hptide(4) with
> } that specific controller?
> }
> } Anyway, thanks a lot for your efforts everyone and sorry for the trouble.
> }
> } Best regards,
> }
> } ND
> >-- End of excerpt from Nino Dehne
>