Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOLVED?)
To: Nino Dehne , Daniel Carosone <dan@geek.com.au>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: current-users
Date: 01/24/2007 10:45:01
	Hello.  I'm happy to report that my md5 tests, which I've been running
over the last 72 hours, have yielded no errors what soever.  These have
been using the pdcsata(4) driver, which seems to work fine.  
	Glad you found the trouble.  It's possible there's a bug in the
hptide(4) driver, it's also possible that there's a bug in the specific
revision of the hpt chipset you have in your card, wich the driver doesn't
work around.  Are there many folks using the hptide(4) driver, and, if so,
what revisions of chips are they using it with?
-Brian

On Jan 24,  7:35pm, Nino Dehne wrote:
} Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOL
} Hi there,
} 
} first, I'm feeling really stupid and I'm terribly sorry to have caused
} such an uproar. It appears that the issue _was_ hardware-based after all.
} At least that's how things look currently. Let me explain:
} 
} Before messing around further I wanted to try the setup in my desktop
} box. So I swapped disks, using a different add-on controller than in
} the server and also using different cables.
} 
} The issue didn't show up. OK, a bit let down that the new server hardware
} might be flaky and not knowing exactly which part of it, I tried running
} the same setup in the desktop with the add-on controller from the server
} (HPT371 single-channel). This brought back the dreaded no-panic-no-nothing-
} lockups I had experienced in the server earlier already. Back then, I
} used both the HPT and an additional SiI0680 cmdide(4) controller so that
} all disks had their dedicated channel. Seeing those lockups on the desktop
} now immediately raised a flag.
} 
} It dawned on me that the cause of the lockups earlier might not have been
} the cmdide(4) controller I ripped out but instead the hptide(4) one. The
} cmdide(4) had other issues in the desktop box, though (lost interrupts).
} 
} I swapped all disks back to the server and replaced the HPT with a Promise
} Fasttrak100. And what can I say, 200 runs without a single error. I will
} watch things closely but I'm confident.
} 
} I still don't understand the symptoms fully, though.
} 
} On Mon, Jan 22, 2007 at 08:45:19AM +1100, Daniel Carosone wrote:
} > > As a wild guess, I resolved all IRQ conflicts on the machine. 
} > > [..]
} > > Both steps helped nothing to resolve the issue.
} > 
} > These were unlikely at this point, but thanks for going to the effort
} > of eliminating them.
} 
} As it turned out, nothing seems to be unlikely. :/ I would have never
} expected the controller to be flaky either. Especially not when I do huge
} transfers from a raw device without an error. Do you think there might
} still be a bug in NetBSD, but instead of the FFS code it's hptide(4) with
} that specific controller?
} 
} Anyway, thanks a lot for your efforts everyone and sorry for the trouble.
} 
} Best regards,
} 
} ND
>-- End of excerpt from Nino Dehne