Subject: bin/34923: dump(8) only dumps a corefile with -X (snapshots)
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org>
From: None <fanch@kekpar.net>
List: netbsd-bugs
Date: 10/26/2006 13:45:02
>Number: 34923
>Category: bin
>Synopsis: dump(8) only dumps a corefile with -X (snapshots)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Oct 26 13:45:02 +0000 2006
>Originator: Francois Brunel
>Release: NetBSD 4.0_BETA
>Organization:
>Environment:
System: NetBSD anita.reso 4.0_BETA NetBSD 4.0_BETA (SUN4M.MP) #0: Thu Oct 19 20:14:22 CEST 2006 root@ptitordi:/usr/tmp/netbsd4/obj/mnt/nfs/sys/arch/sparc/compile/SUN4M.MP sparc
Architecture: sparc
Machine: sparc
>Description:
When dumping an entire filesystem with snapshots (-X),
shortly after Pass III begins, a dump process core dump,
and others dump processes get stuck until I abort the
operation. The problem is systematic, and disapear when
I don't give the snapshot option to dump.
After analysing a core file of dump on a sparc MP, it
appears that cdesc (tape.c), used to cache data read,
is modified by a process while used by another one.
Seeing that this is a static and protected with flock()
(in bread()), I walked the source and seen that, when you
use snapshots, diskfd (global on which locks are
taken) is opened at main.c with:
diskfd = snap_open(mntinfo->f_mntonname, snap_backup, &tnow);
(main.c:446)
After this, child processes are forked, and share cdesc. But
these childs, to get a new seek pointer, close diskfd and
reopen it with:
if ((diskfd = open(disk, O_RDONLY)) < 0) (tape.c:836)
As if no snapshot option was given. Childs then call bread()
(which uses cdesc) (tape.c:846).
But parent calls dumpino() at main.c:598 and 618, which call
dumpindir() (traverse.c:584) which call bread().
So, if my assumption that theses file descriptors point to
different file objects is correct (and cdesc corruption makes
me think so), what does I get on a tape? Data read from a
snapshot, from the disk, or a mix?
>How-To-Repeat:
dump -0 -autX -h 0 -f /dev/rst0 /export
/export is ffs on RAID 1.
>Fix:
dump child processes should reopen the snapshot file object
when snapshots are in use.