Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOLVED?)
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 01/26/2007 17:03:09
Brian Buhrow writes:
> 	Hello.  I'm happy to report that my md5 tests, which I've been running
> over the last 72 hours, have yielded no errors what soever.  These have
> been using the pdcsata(4) driver, which seems to work fine.  
> 	Glad you found the trouble.  It's possible there's a bug in the
> hptide(4) driver, it's also possible that there's a bug in the specific
> revision of the hpt chipset you have in your card, wich the driver doesn't
> work around.  Are there many folks using the hptide(4) driver, and, if so,
> what revisions of chips are they using it with?

I've been running disks on this:

hptide0 at pci0 dev 19 function 0
hptide0: Triones/Highpoint HPT370 IDE Controller
hptide0: bus-master DMA support present
hptide0: primary channel wired to native-PCI mode
hptide0: using irq 10 for native-PCI interrupt

for *ages* now.  (I had problems with lockups early on before I upgraded 
the bios on the thing... but since then it's been 100% solid and used 
to drive disks under RAIDframe sets..)

Later...

Greg Oster

> On Jan 24,  7:35pm, Nino Dehne wrote:
> } Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOL
> } Hi there,
> } 
> } first, I'm feeling really stupid and I'm terribly sorry to have caused
> } such an uproar. It appears that the issue _was_ hardware-based after all.
> } At least that's how things look currently. Let me explain:
> } 
> } Before messing around further I wanted to try the setup in my desktop
> } box. So I swapped disks, using a different add-on controller than in
> } the server and also using different cables.
> } 
> } The issue didn't show up. OK, a bit let down that the new server hardware
> } might be flaky and not knowing exactly which part of it, I tried running
> } the same setup in the desktop with the add-on controller from the server
> } (HPT371 single-channel). This brought back the dreaded no-panic-no-nothing-
> } lockups I had experienced in the server earlier already. Back then, I
> } used both the HPT and an additional SiI0680 cmdide(4) controller so that
> } all disks had their dedicated channel. Seeing those lockups on the desktop
> } now immediately raised a flag.
> } 
> } It dawned on me that the cause of the lockups earlier might not have been
> } the cmdide(4) controller I ripped out but instead the hptide(4) one. The
> } cmdide(4) had other issues in the desktop box, though (lost interrupts).
> } 
> } I swapped all disks back to the server and replaced the HPT with a Promise
> } Fasttrak100. And what can I say, 200 runs without a single error. I will
> } watch things closely but I'm confident.
> } 
> } I still don't understand the symptoms fully, though.
> } 
> } On Mon, Jan 22, 2007 at 08:45:19AM +1100, Daniel Carosone wrote:
> } > > As a wild guess, I resolved all IRQ conflicts on the machine. 
> } > > [..]
> } > > Both steps helped nothing to resolve the issue.
> } > 
> } > These were unlikely at this point, but thanks for going to the effort
> } > of eliminating them.
> } 
> } As it turned out, nothing seems to be unlikely. :/ I would have never
> } expected the controller to be flaky either. Especially not when I do huge
> } transfers from a raw device without an error. Do you think there might
> } still be a bug in NetBSD, but instead of the FFS code it's hptide(4) with
> } that specific controller?
> } 
> } Anyway, thanks a lot for your efforts everyone and sorry for the trouble.
> } 
> } Best regards,
> } 
> } ND
> >-- End of excerpt from Nino Dehne
>