NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

fssconfig/raidframe/dump-related crashes



My 7-stable/amd64 server crashes nearly every night while my backup
routine is in progress.  There's no backtrace and no crash dump is saved,
but the console reads:
> ohci1: 1 scheduling overruns
> ohci1: WARNING: addr 0x01cf0000 not found
> ohci1: WARNING: addr 0x012c0000 not found
> ohci1: WARNING: addr 0x01cf0000 not found
> ohci1: WARNING: addr 0x01d50000 not found
> ohci1: WARNING: addr 0x01d70000 not found
> ohci1: WARNING: addr 0x01d60000 not found
> ohci1: WARNING: addr 0x012c0000 not found
> ohci1: WARNING: addr 0x01cf0000 not found
> ohci1: 44 scheduling overruns
> [more of these]
> ohci1: 46 scheduling overruns
before it reboots (sometimes hangs instead).

Probably, these messages are /not/ traces of the root cause, since the
machine will also crash with a kernel with no ohci support compiled in
whatsoever - the crashes are silent, then.

It happens while dump(8)ing an in-filesystem fss(4)-snapshot of an empty-ish
FFSv1 (fslevel 4) filesystem sitting on a raid(4)-1 with two components.

There should not, conceptually, be a problem with dumping a fss device,
right?

The command my script runs to create the snapshot is
# fssconfig -cx fss0 /stor /stor/snapshot
and the dump
# dump -$lvl -uant -h 0 -L "$nam" -f - /dev/rfss0 >/tmp/dumpfifo
where /tmp/dumpfifo is a fifo from which
# gzip -1 </tmp/dumpfifo >/var/tmp/dump.gz
reads.  (I don't remember the reason for going via a fifo, but there
was one...)

Any suggestions where I could start looking?  So far, I've tried running
a DEBUG kernel but that didn't provide additional information.
The filesystem is clean as far as fsck_ffs is concerned, too.


Here's some information on the filesystem:

# mount -v | grep /stor
/dev/raid0g on /stor type ffs (log, noatime, local, fsid: 0x1206/0x78b, reads: sync 8489 async 0, writes: sync 0 async 1791)


# df -h /stor
Filesystem         Size       Used      Avail %Cap Mounted on
/dev/raid0g        416G        19G       376G   4% /stor


# dumpfs -s /stor
file system: /dev/rraid0g
format	FFSv1
endian	little-endian
magic	11954   	time	Fri Mar 11 05:55:54 2016
superblock location	8192	id	[ 564b7b58 793a9223 ]
cylgrp	dynamic	inodes	4.4BSD	sblock	FFSv2	fslevel 4
nbfree	12996876	ndir	56002	nifree	26708703	nffree	4718
ncg	580	size	109891568	blocks	109028517
bsize	32768	shift	15	mask	0xffff8000
fsize	4096	shift	12	mask	0xfffff000
frag	8	shift	3	fsbtodb	3
bpg	23684	fpg	189472	ipg	47104
minfree	5%	optim	time	maxcontig 2	maxbpg	8192
symlinklen 60	contigsumsize 2
maxfilesize 0x004002001005ffff
nindir	8192	inopb	256
avgfilesize 16384	avgfpdir 64
sblkno	8	cblkno	16	iblkno	24	dblkno	1496
sbsize	4096	cgsize	32768
csaddr	1496	cssize	12288
cgrotor	0	fmod	0	ronly	0	clean	0x02
wapbl version 0x1	location 2	flags 0x0
wapbl loc0 439587072	loc1 131072	loc2 512	loc3 3
flags	wapbl 
fsmnt	/stor
volname		swuid	0


# raidctl -sv raid0
Components:
           /dev/wd0a: optimal
           /dev/wd1a: optimal
No spares.
Component label for /dev/wd0a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2015111701, Mod Counter: 1213
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 913211264
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Force
   Last configured as: raid0
Component label for /dev/wd1a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2015111701, Mod Counter: 1213
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 913211264
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Force
   Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
# exit


Home | Main Index | Thread Index | Old Index