Subject: Re: panic while building a raid-1 set one component at a time
To: None <current-users@NetBSD.org>
From: Jeff Rizzo <riz@boogers.sf.ca.us>
List: current-users
Date: 10/05/2003 16:29:05
After noodling on this for a while, it occurs to me that the machine I'm
building this RAID set on before deploying it has only 16MB of memory...
Is that little enough to cause this particular issue? I know that
raidframe is somewhat memory intensive... If so, is there anything I can
do kernelwise to strip down the rest of the memory needs so I can get this
set built? It's not going to live here permanently, but I'd
sure like to get it built before moving it to its final destination...
Thanks,
+j
On Sun, Oct 05, 2003 at 12:12:07PM -0700, Jeff Rizzo wrote:
> I've done this before, but not for about a year, so I'm not sure
> if I'm doing something wrong here, or what. I'm working with a GENERIC
> kernel circa September 28 on i386 (from the releng.netbsd.org snapshot
> that day)
>
> I've got two identical disks, and constructed half a raid-1 on one (I
> needed the other to bootstrap from sysinst) as it says to do in the
> raidctl man page; it seems to be working fine in degraded mode.
>
> The two disks are wd1 and wd2; wd2 is the working component of the raid
> set; I'm trying to add wd1. I copied the disklabel from wd2 onto wd1,
> did a 'raidctl -a /dev/wd1a raid0', and then when I try to do the
> 'raidctl -F component0 raid0', it panics:
>
> # raidctl -a /dev/wd1a raid0
> Warning: truncating spare disk /dev/wd1a to 488396928 blocks
> # Oct 5 10:26:49 /netbsd: Warning: truncating spare disk /dev/wd1a to 488396928 blocks
> raidctl -F component0 raid0
> RECON: initiating reconstruction on row 0 col 0 -> spare at row 0 col 2
> raid0: Quiescence reached..
> panic: malloc: out of space in kmem_map
> Stopped in pid 399.1 (raid_recon) at netbsd:cpu_Debugger+0x4: leave
> db>
>
> Now, I'm wondering about the "Warning: truncating spare disk" message;
> I can't see anything different about the labels of wd1 and wd2, and I
> didn't get that message when I built wd2.
>
> One interesting point: I can't seem to change the info on wd2c in the
> disklabel; it always returns to
>
> c: 15 0 unused 0 0 # (Cyl. 0 - 0*)
>
> No matter how I edit it with "disklabel", though the edits always seem to
> take.
>
> Anyway, here's the entire sequence. I hope there's some clue in here
> somewhere...
>
> # disklabel wd1
> # /dev/rwd1d:
> type: ESDI
> disk: WDC WD2500JB-32F
> label: fictitious
> flags:
> bytes/sector: 512
> sectors/track: 63
> tracks/cylinder: 16
> sectors/cylinder: 1008
> cylinders: 484521
> total sectors: 488397168
> rpm: 3600
> interleave: 1
> trackskew: 0
> cylinderskew: 0
> headswitch: 0 # microseconds
> track-to-track seek: 0 # microseconds
> drivedata: 0
>
> 4 partitions:
> # size offset fstype [fsize bsize cpg/sgs]
> a: 488397105 63 RAID # (Cyl. 0*- 484520)
> c: 488397105 63 unused 0 0 # (Cyl. 0*- 484520)
> d: 488397168 0 unused 0 0 # (Cyl. 0 - 484520)
> # disklabel wd2
> # /dev/rwd2d:
> type: ESDI
> disk: WDC WD2500JB-32F
> label: fictitious
> flags:
> bytes/sector: 512
> sectors/track: 63
> tracks/cylinder: 16
> sectors/cylinder: 1008
> cylinders: 484521
> total sectors: 488397168
> rpm: 3600
> interleave: 1
> trackskew: 0
> cylinderskew: 0
> headswitch: 0 # microseconds
> track-to-track seek: 0 # microseconds
> drivedata: 0
>
> 4 partitions:
> # size offset fstype [fsize bsize cpg/sgs]
> a: 488397105 63 RAID # (Cyl. 0*- 484520)
> c: 15 0 unused 0 0 # (Cyl. 0 - 0*)
> d: 488397168 0 unused 0 0 # (Cyl. 0 - 484520)
> # raidctl -a /dev/wd1a raid0
> Warning: truncating spare disk /dev/wd1a to 488396928 blocks
> # Oct 5 11:07:15 /netbsd: Warning: truncating spare disk /dev/wd1a to 488396928 blocks
> raidctl -s raid0
> Components:
> component0: failed
> /dev/wd2a: optimal
> Spares:
> /dev/wd1a: spare
> component0 status is: failed. Skipping label.
> Component label for /dev/wd2a:
> Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
> Version: 2, Serial Number: 20031005, Mod Counter: 101
> Clean: No, Status: 0
> sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
> Queue size: 100, blocksize: 512, numBlocks: 488396928
> RAID Level: 1
> Autoconfig: Yes
> Root partition: Yes
> Last configured as: raid0
> /dev/wd1a status is: spare. Skipping label.
> Parity status: DIRTY
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
> # raidctl -F component0 raid0
> RECON: initiating reconstruction on row 0 col 0 -> spare at row 0 col 2
> raid0: Quiescence reached..
> panic: malloc: out of space in kmem_map
> Stopped in pid 398.1 (raid_recon) at netbsd:cpu_Debugger+0x4: leave
> db> bt
> cpu_Debugger(0,e8f000,c087c000,0,e8f000) at netbsd:cpu_Debugger+0x4
> panic(c0695840,0,e8f000,0,3a38b1) at netbsd:panic+0x11d
> malloc(e8e2c4,c06cad40,0,0,3a38b1) at netbsd:malloc+0x167
> rf_MakeReconMap(c08d5000,80,0,1d1c5880,0) at netbsd:rf_MakeReconMap+0xc2
> rf_MakeReconControl(c0974900,0,0,0,2) at netbsd:rf_MakeReconControl+0x171
> rf_ContinueReconstructFailedDisk(c0974900,0,2,0,c20ac4e0) at netbsd:rf_ContinueR
> econstructFailedDisk+0xc1
> rf_ReconstructFailedDiskBasic(c08d5000,0,0,c08d5000,c088fe60) at netbsd:rf_Recon
> structFailedDiskBasic+0xb9
> rf_ReconstructFailedDisk(c08d5000,0,0,1,c0100d22) at netbsd:rf_ReconstructFailed
> Disk+0x60
> rf_FailDisk(c08d5000,0,0,1,c42bd1b8) at netbsd:rf_FailDisk+0xc7
> rf_ReconThread(c0924ec0,7e0000,7e9000,0,c010030c) at netbsd:rf_ReconThread+0x43
> db>
> db> ps
> PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
> >398 0 0 0 2 0x20200 1 raid_recon
> 351 332 351 0 2 0x4002 1 raidctl
> 349 1 1 0 2 0x4000 1 getty nanosle
> 333 1 1 0 2 0x4000 1 getty nanosle
> 343 1 1 0 2 0x4000 1 getty nanosle
> 332 1 332 0 2 0x4003 1 csh pause
> 337 1 337 0 2 0 1 cron nanosle
> 330 1 330 0 2 0 1 inetd kqread
> 281 1 281 0 2 0 1 sshd select
> 171 1 171 0 2 0 1 rpcbind select
> 150 1 150 0 2 0 1 syslogd
> 120 1 120 0 2 0 1 dhclient select
> 8 0 0 0 2 0x20200 1 aiodoned aiodone
> 7 0 0 0 2 0x20200 1 ioflush syncer
> 6 0 0 0 2 0x20200 1 reaper reaper
> 5 0 0 0 2 0x20200 1 pagedaemon pgdaemo
> 4 0 0 0 2 0x20200 1 lfs_writer lfswrit
> 3 0 0 0 2 0x20200 1 raidio0 raidiow
> 2 0 0 0 2 0x20200 1 raid0 rfwcond
> 1 0 1 0 2 0x4000 1 init wait
> 0 -1 0 0 2 0x20200 1 swapper
> db>
>
> Thanks in advance for any clues anyone can provide...
>
> +j
--
Jeff Rizzo http://boogers.sf.ca.us/~riz