current-users: Re: Data corruption issues possibly involving cgd(4)

Subject: Re: Data corruption issues possibly involving cgd(4)
To: Daniel Carosone <dan@geek.com.au>
From: Nino Dehne <ndehne@gmail.com>
List: current-users
Date: 01/17/2007 23:58:56

On Thu, Jan 18, 2007 at 07:31:47AM +1100, Daniel Carosone wrote:
> Nino, are you running a kernel with DIAGNOSTIC and/or DEBUG?  Looking
> at the cgd panic you found, I'm guessing not, because the path we see
> to that problem would have involved one or more likely DIAGNOSTIC
> messages.

Not yet, but that just went on my list of things to try.


> The combination of ffsv2, >1Tb, and older kernels
> smells fishy to me, and any additional clues they may provide could be
> vital.  Reproducing that combination on a test machine, without cgd
> and R5, would also be a good idea if feasible.

Unfortunately, I'll have to pass. That filesystem is the only one of that
size I have access to. Perhaps someone else is running that combination.
It shouldn't be too unlikely.

My plans for today:

1) Boot DIAGNOSTIC+DEBUG kernel
2) Run fsck -f[1]
3) Last resort: transfer disks to my desktop machine and try to reproduce
   the problem

Best regards,

ND


[1] That fs went through several real fscks recently as I was fighting
    some stubborn disk controller[2]. I never noticed anything unusual.
    Still gonna try, though.
[2] A SiI0680 cmdide(4) controller was apparently causing lockups during
    heavy I/O. The disks are now master+slave on an onboard viaide(4)
    (4 disks) and master on a PCI hptide(4) (the 5th disk).