Subject: Weird hang with dump
To: None <current-users@netbsd.org>
From: Tom Ivar Helbekkmo <tih@eunetnorge.no>
List: current-users
Date: 11/19/2003 08:53:54
I'm suddenly seeing something strange when I try to backup my file
systems to tape, here...  The system is running i386-current, and has
a number of small SCSI disks, set up as mirror pairs using RAIDframe:

/dev/raid0a     /            ffs   rw,softdep         1 1
/dev/raid1b      none        swap  sw                 0 0
swap            /tmp         mfs   rw,-s524288        0 0
/dev/raid0e     /var         ffs   rw,softdep         1 2
/dev/raid2e     /usr         ffs   rw,softdep         1 2
/dev/raid3e     /u           ffs   rw,softdep         1 2
/dev/raid4e     /usr/local   ffs   rw,softdep         1 2

Filesystem    Size     Used     Avail Capacity  Mounted on
/dev/raid0a   247M      37M      198M    15%    /
/dev/raid0e   2.0G     185M      1.7G     9%    /var
/dev/raid2e   4.9G     1.7G      3.0G    35%    /usr
mfs:316       248M     6.0K      236M     0%    /tmp
/dev/raid3e   8.3G     5.1G      2.8G    64%    /u
/dev/raid4e    34G      20G       12G    62%    /usr/local

(The last one, raid4, is a RAID 5 set.)

I back this stuff up to tape regularly, using this sequence of commands:

dump 0ubBhf 64 20000000 0 /dev/nrst1 /
dump 0ubBhf 64 20000000 0 /dev/nrst1 /var
dump 0ubBhf 64 20000000 0 /dev/nrst1 /u
dump 0ubBhf 64 20000000 0 /dev/nrst1 /usr/local
dump 0ubBhf 64 20000000 0 /dev/nrst1 /usr
mt -f /dev/rst1 rewoffl

This gives me a complete set of dumps on a single tape.

After my last update to -current, though, the dump sequence never
finishes.  Sometime during the dumping of the /u file system (but at
different spots on subsequent retries), it'll just stop dumping, with
no error messages either to the dump user, on the console, or in
/var/log/*...  Hitting ^T shows that dump is in a "paused" state:

  DUMP: Found /dev/rraid3e on /u in /etc/fstab
  DUMP: Date of this level 0 dump: Tue Nov 18 04:33:16 2003
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/rraid3e (/u) to /dev/nrst1
  DUMP: Label: none
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 5348023 tape blocks on 0.27 tape(s).
  DUMP: Volume 1 started at: Tue Nov 18 04:33:17 2003
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 20.72% done, finished in 0:19
  DUMP: 39.72% done, finished in 0:15
  DUMP: 55.63% done, finished in 0:11
load: 0.25  cmd: dump 29811 [pause] 13.84u 22.45s 0% 16768k
58.40% done at 344 KB/s, finished in 1:47
^C  DUMP: Interrupt received.
  DUMP: Do you want to abort dump?: ("yes" or "no") yes
  DUMP: The ENTIRE dump is aborted.

Here, there was about a two hour pause between the last normal output
from dump, and the ^T output.  I haven't done anything yet to find out
whether it's completely hung, or possibly progressing at an incredibly
slow rate.

Anyone have any ideas as to what I might try to do to resolve this?

-tih
-- 
Tom Ivar Helbekkmo, Senior System Administrator, EUnet Norway
www.eunet.no  T: +47-22092958 M: +47-93013940 F: +47-22092901