Subject: bin/34923: dump(8) only dumps a corefile with -X (snapshots)
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org>
From: None <fanch@kekpar.net>
List: netbsd-bugs
Date: 10/26/2006 13:45:02
>Number:         34923
>Category:       bin
>Synopsis:       dump(8) only dumps a corefile with -X (snapshots)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 26 13:45:02 +0000 2006
>Originator:     Francois Brunel
>Release:        NetBSD 4.0_BETA
>Organization:
>Environment:
System: NetBSD anita.reso 4.0_BETA NetBSD 4.0_BETA (SUN4M.MP) #0: Thu Oct 19 20:14:22 CEST 2006 root@ptitordi:/usr/tmp/netbsd4/obj/mnt/nfs/sys/arch/sparc/compile/SUN4M.MP sparc
Architecture: sparc
Machine: sparc
>Description:
	When dumping an entire filesystem with snapshots (-X),
	shortly after Pass III begins, a dump process core dump,
	and others dump processes get stuck until I abort the
	operation. The problem is systematic, and disapear when
	I don't give the snapshot option to dump.

	After analysing a core file of dump on a sparc MP, it
	appears that cdesc (tape.c), used to cache data read,
	is modified by a process while used by another one.

	Seeing that this is a static and protected with flock()
	(in bread()), I walked the source and seen that, when you
	use snapshots, diskfd (global on which locks are
	taken) is opened at main.c with:

	diskfd = snap_open(mntinfo->f_mntonname, snap_backup, &tnow);
	(main.c:446)

	After this, child processes are forked, and share cdesc. But
	these childs, to get a new seek pointer, close diskfd and
	reopen it with:

	if ((diskfd = open(disk, O_RDONLY)) < 0) (tape.c:836)

	As if no snapshot option was given. Childs then call bread()
	(which uses cdesc) (tape.c:846).

	But parent calls dumpino() at main.c:598 and 618, which call
	dumpindir() (traverse.c:584) which call bread().

	So, if my assumption that theses file descriptors point to
	different file objects is correct (and cdesc corruption makes
	me think so), what does I get on a tape? Data read from a
	snapshot, from the disk, or a mix?

>How-To-Repeat:
	dump -0 -autX -h 0 -f /dev/rst0 /export
	/export is ffs on RAID 1.

>Fix:
	dump child processes should reopen the snapshot file object
	when snapshots are in use.