tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: fssconfig/raidframe/dump-related crashes



On Fri, 11 Mar 2016, Timo Buhrmester wrote:

My 7-stable/amd64 server crashes nearly every night while my backup
routine is in progress.  There's no backtrace and no crash dump is saved,
but the console reads:
ohci1: 1 scheduling overruns
ohci1: WARNING: addr 0x01cf0000 not found
ohci1: WARNING: addr 0x012c0000 not found
ohci1: WARNING: addr 0x01cf0000 not found
ohci1: WARNING: addr 0x01d50000 not found
ohci1: WARNING: addr 0x01d70000 not found
ohci1: WARNING: addr 0x01d60000 not found
ohci1: WARNING: addr 0x012c0000 not found
ohci1: WARNING: addr 0x01cf0000 not found
ohci1: 44 scheduling overruns
[more of these]
ohci1: 46 scheduling overruns
before it reboots (sometimes hangs instead).

Probably, these messages are /not/ traces of the root cause, since the
machine will also crash with a kernel with no ohci support compiled in
whatsoever - the crashes are silent, then.

Perhaps you can get it to provide a backtrace rather than just simply rebooting?


It happens while dump(8)ing an in-filesystem fss(4)-snapshot of an empty-ish
FFSv1 (fslevel 4) filesystem sitting on a raid(4)-1 with two components.

There should not, conceptually, be a problem with dumping a fss device,
right?

The command my script runs to create the snapshot is
# fssconfig -cx fss0 /stor /stor/snapshot
and the dump
# dump -$lvl -uant -h 0 -L "$nam" -f - /dev/rfss0 >/tmp/dumpfifo
where /tmp/dumpfifo is a fifo from which
# gzip -1 </tmp/dumpfifo >/var/tmp/dump.gz
reads.  (I don't remember the reason for going via a fifo, but there
was one...)

Any suggestions where I could start looking?  So far, I've tried running
a DEBUG kernel but that didn't provide additional information.
The filesystem is clean as far as fsck_ffs is concerned, too.


I use the built-in snapshot capability of dump(8), with the -X option:

	dump -0au -h 0 -X -f- ${fs} | gzip -9 > $outfile

and it hasn't had any issues. But I'm not using raid, so that could be a factor.


+------------------+--------------------------+------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+


Home | Main Index | Thread Index | Old Index