Subject: Re: ffs crash after raid reconfiguration
To: Kazushi Marukawa (Jam) <jam@pobox.com>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 07/21/2001 02:47:37
Kazushi Marukawa writes:
> Hi,
> 
> Today, after a fault of one disk in a four disks raid5
> system, I removed the disk and started the system.  I forgot
> to start it in single user mode.  The system failed the auto
> raid configuration because of miss alignment (my drives are
> IDE). 

Ummm...  The autoconfig is not supposed to care what components are on
what drives, and what ID's those drives might be at.  Can you send me the
'dmesg' from this boot?

> It also failed the fsck because of the failure of
> auto raid configuration.  The system started the signle user
> mode.  I changed raid configuration file and reconfigured

Hmmm. 

> the system with "raidctl -c raid0.conf raid0".  I mounted
> raid0 without fsck because it was clean. 

You did a normal shutdown to remove the failed drive, right?

> I exited from the
> single user mode.  The system became available without
> reboot.  I started my job.  Then, the sytem was crashed with
> following message and trace.
> 
> start = 2187, len = 117, fs = /home

/home is on raid0 I take it?

> offset=824 824
> panic: ffs_alloccg: map corrupted
> Stopped in pid 524 (ftp) at     cpu_Debugger+0x4:      leave
> db> trace
> cpu_Debugger(0,88c,88b,d3e70a08,c0323489) at cpu_Debugger+0x4
> panic(c04af92e,c04af91f,338,338,4458) at panic+0xa0
> ffs_mapsearch(c0b72000,cf07c000,4458,8,2000) at ffs_mapsearch+0x18d
> ffs_alloccgblk(d3f2fb5c,c5ca3b20,4458,8b8,c0b72000) at ffs_alloccgblk+0x36d
> ffs_alloccg(d3f2fb5c,8b8,2740458,2000,10) at ffs_alloccg+0xf0
> ffs_hashalloc(d3f2fb5c,8b8,2740458,2000,c03219c8) at ffs_hashalloc+0x23
> ffs_alloc(d3f2fb5c,35,2740458,2000,c0c88880) at ffs_alloc+0xcb
> ffs_balloc(d3e70c54,5a8,d3e70cf0,0,c049c740) at ffs_balloc+0xa3b
> VOP_BALLOC(d3d1737c,6a000,0,5a8,c0c88880) at VOP_BALLOC+0x4c
> ffs_ballocn(d3e70cf0,d3d1737c,d3e70d48,d3d17424,c049c780) at ffs_ballocn+0x9b
> VOP_BALLOCN(d3d1737c,6a000,0,5a8,0) at VOP_BALLOCN+0x4c
> ufs_balloc_range(d3d1737c,6a000,0,5a8,0) at ufs_balloc_range+0x2f3
> ffs_write(d3e70e78,1,c049c020,d3d1737c,d3e70f04) at ffs_write+0x208
> VOP_WRITE(d3d17c,d3e70f04,1,c0c88880,d3d1737c) at VOP_WRITE+0x38
> vn_write(d3df2904,d3df2920,d3e70f04,c0c88880,1) at vn_write+0x9e
> dofilewrite(d3e2dc88,6,d3df2904,8088000,5a8) at dofilewrite+0x94
> sys_write(d3e2dc88,d3e70f80,d3e70f78) at sys_write+0x40
> syscall_plain(1f,1f,1f,1f,805f041) at syscall_plain+0x98
> db> c
> syncing disks... panic: lockmgr: locking against myself
> Stopped in pid 524 (ftp) at     cpu_Debugger+0x4:
> leave
> db> reboot
> 
> 
> I guess there may be a problem around the raid
> reconfiguration.  Today and last time, I experienced panics
> when I start the system without reboot after the
> reconfiguration. 

Sorry... I'm not sure what you mean here.  Please explain with a bit more 
detail.  If the 'raidctl -I 1234' has been done, and autoconfiguration is 
turned on, then it shouldn't matter where the components are and what ID's the 
drives are at -- the autoconfig code is supposed to find them, sort out which 
ones belong to what sets, and glue the appropriate ones together.  Can you 
also send me the output of 'raidctl -s raid0'? (for whatever state raid0 is in 
now.)

> I may be wrong, but could someone check
> the codes?  Thanks.
> 
>   NetBSD sou.nerv.org 1.5W NetBSD 1.5W (sou) #7: Tue Jul 17 04:12:37 CDT 2001
>  jam@sou.nerv.org:/mnt/src/sys/arch/i386/compile/sou i386
>   And, I'm using softdep for this raid system this time.
> 
> Regards,
> -- Kazushi
> Newlan's Truism:
> 	An "acceptable" level of unemployment means that the government
> economist to whom it is acceptable still has a job.

Later...

Greg Oster