Subject: bin/6607: dump est could be better
To: None <gnats-bugs@gnats.netbsd.org>
From: None <bgrayson@ece.utexas.edu>
List: netbsd-bugs
Date: 12/17/1998 22:09:45
>Number:         6607
>Category:       bin
>Synopsis:       dump estimate of blocks could be better when doing subset of fs
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people (Utility Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Dec 17 20:20:01 1998
>Last-Modified:
>Originator:     Brian Grayson
>Organization:
	Parallel and Distributed Systems
	Electrical and Computer Engineering
	The University of Texas at Austin
>Release:        Dec 17, 1998
>Environment:
	<Use :.!uname -a to embed this>

>Description:
	The estimating code for dump, when dumping only a subset
	of a filesystem, counts most directories more times than
	necessary.  In most cases this effect is small, but it is
	possible to create a directory tree such that the
	estimate is off by an arbitrary factor (off by 10x is
	shown below).
	
>How-To-Repeat:
	# mkdir /tmp/testdir
	# mkdir /tmp/testdir/d{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}
	# /sbin/dump 0f /dev/null /tmp/testdir
	  DUMP: Dumping sub files/directories from /
	...
	  DUMP: estimated 21059 tape blocks on 0.54 tape(s).
	...
	  DUMP: 2037 tape blocks on 1 volume

	# dump.new_a 0f /dev/null /tmp/testdir
	...
	  DUMP: estimated 2039 tape blocks on 0.05 tape(s).
	...
	  DUMP: 2037 tape blocks on 1 volume

	(Patches a, b, and c all estimate 2039, and end up with
	2037.  However, I am not sure which is the best patch. 
	"A" is the safest, but I'm not sure if we even want to be
	looking at FTS_DOT/FTS_SEEDOT things in the first place.)
	
>Fix:
	
Patch A:  in mapfileino, if we've already called mapfileino on
this inode, then return immediately.
--- traverse.c.dist	Thu Dec 17 21:38:54 1998
+++ traverse.c	Thu Dec 17 22:00:51 1998
@@ -149,6 +149,9 @@
 	int mode;
 	struct dinode *dp;
 
+	/*  If we've already looked at this inode, then
+	 *  short-circuit and return.  */
+	if (TSTINO(ino, usedinomap)) return;
 	dp = getino(ino);
 	if ((mode = (dp->di_mode & IFMT)) == 0)
 		return;

Patch B:  don't pass the SEEDOT option to the fts_open() call:
--- traverse.c.dist	Thu Dec 17 21:38:54 1998
+++ traverse.c	Thu Dec 17 22:01:25 1998
@@ -193,7 +193,7 @@
 			msg("Can't determine cwd: %s\n", strerror(errno));
 			dumpabort(0);
 		}
-		if ((dirh = fts_open(dirv, FTS_PHYSICAL|FTS_SEEDOT|FTS_XDEV,
+		if ((dirh = fts_open(dirv, FTS_PHYSICAL|FTS_XDEV,
 		    		    NULL)) == NULL) {
 			msg("fts_open failed: %s\n", strerror(errno));
 			dumpabort(0);

Patch C:  when doing the traversal, just ignore . and ..
entries from FTS that weren't specified in the fts_open() command:
--- traverse.c.dist	Thu Dec 17 21:38:54 1998
+++ traverse.c	Thu Dec 17 22:02:59 1998
@@ -205,6 +205,7 @@
 			case FTS_NS:
 				msg("Can't fts_read %s: %s\n", entry->fts_path,
 				    strerror(errno));
+			case FTS_DOT:		/*  Skip it.  */
 			case FTS_DP:		/* already seen dir */
 				continue;
 			}

>Audit-Trail:
>Unformatted: