Subject: Re: 1.6.1 cgd panics
To: None <netbsd-users@netbsd.org>
From: Jorgen Lundman <lundman@lundman.net>
List: netbsd-users
Date: 11/29/2003 14:02:50
I have just tried a -current kernel, Nov 22nd to be exact. It verntured better
in that it stayed up for nearly 5 days, but on Sat at 04:30 it died. (Thats when
weekly runs no?)
If it paniced and rebooted it woul dbe nice, but it does not. Rsponds to ping,
and you can connect to ports listening but no IO comes through. (No ssh greeting
etc).
Not sure why weekly manages to kill it two weeks in a row (first 1.6.1 kernel,
now the -current kernel). Sure it does the updatedb call, but I modified it not
to find on the cgd devices, so its only scanning a normal 1.6.1 boot partition.
(I do my own updatedb on the crypt disks every hour!)
Are there any known issues with cgd at all? Anyone have a patch to make the
kernel reboot when it see's a magic MASK in a ping packet?:)
Lund
Jorgen Lundman wrote:
>
> This looks quite similar to
>
> http://mail-index.netbsd.org/netbsd-help/2003/01/27/0027.html
>
> in terms of panic, but since I'm in 1.6.1 I assume it has already been
> fixed?
>
> "I think there was a bug in the ufs_daddr_t change which could cause this.
> This should be fixed now." doesn't give me much clue, can I check if my
> sources are good?
>
> My ffs_inode.c (that contains ffs_truncate) is version:
>
> /* $NetBSD: ffs_inode.c,v 1.51 2001/12/18 10:57:21 fvdl Exp $ */
>
>
> or, if someone can recommend a -current that has been stable for you, I
> can give that a quick go.
>
> Lund
>
>
>
> Jorgen Lundman wrote:
>
>>
>> Hello,
>>
>> Netbsd-1.6.1 i386
>>
>> Using the backported cgd-1.6-20030912.diff, as well as my own backport
>> of nvidia IDE controller (just a matter of adding the product code)
>> and that of nvidia's ex interface.
>>
>> I seem to have a panic every two days or so, and the cause appear to
>> be in the filesystem area. First one was in ffs_alloc but that is just
>> from memory, and no core saved. Lost cgd2 and cgd3 from that. Tried
>> fsck'ing, but it was a real mess, ended up in a infinite loop and
>> would never finish. Most likely a block was written un-encrypted, or
>> one read but not decrypted, which would be a somewhat less than
>> desired thing.
>>
>> Second panic left a core, most exactly:
>>
>> panic: blkfree: freeing free frag
>> #0 0x1 in ?? ()
>> #1 0xc03711b7 in cpu_reboot ()
>> #2 0xc029566e in panic ()
>> #3 0xc028719d in lockmgr ()
>> #4 0xc02b8448 in genfs_lock ()
>> #5 0xc02b744e in VOP_LOCK ()
>> #6 0xc02b6c11 in vn_lock ()
>> #7 0xc02b0638 in vget ()
>> #8 0xc024d767 in ffs_sync ()
>> #9 0xc02b2b36 in sys_sync ()
>> #10 0xc02b1b56 in vfs_shutdown ()
>> #11 0xc037118f in cpu_reboot ()
>> #12 0xc029566e in panic ()
>> #13 0xc02421c4 in ffs_blkfree ()
>> #14 0xc0244756 in ffs_truncate ()
>> #15 0xc02b7745 in VOP_TRUNCATE ()
>> #16 0xc025f8ec in ufs_inactive ()
>> #17 0xc02b73ee in VOP_INACTIVE ()
>> #18 0xc02b0742 in vput ()
>> #19 0xc02631e5 in ufs_remove ()
>> #20 0xc02b71a1 in VOP_REMOVE ()
>> #21 0xc02b4463 in sys_unlink ()
>> #22 0xc037aff3 in syscall_plain ()
>> ---Type <return> to continue, or q <return> to quit---
>> #23 0xc0100e74 in syscall1 ()
>> can not access 0xbfbfdc54, invalid translation (invalid PDE)
>> can not access 0xbfbfdc54, invalid translation (invalid PDE)
>> Cannot access memory at address 0xbfbfdc54
>> (gdb)
>>
>> fsck'ing doesn't look good, cgd1 appears to be gone. Not sure how many
>> others (about 14 in total).
>>
>> Not using softdep, just plain vanilla mount.
>>
>> Is it worth trying a (stable?) -current kernel and see if things go
>> better? Can someone recommend one? Is there a know problem with the
>> version of kernel/cgd that I am running.
>>
>> It is too much to expect fsck to be able to handle a fs with this
>> level of corruption but is someone interested in seeing what issues I
>> get?
>>
>> Sincerely,
>>
>> Lundy
>>
>>
>
--
Jorgen Lundman | <lundman@lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)