Subject: proposed pax changes (was Big problems with snapshot/20000226, part 1)
To: None <tech-install@netbsd.org>
From: Simon Burge <simonb@netbsd.org>
List: tech-install
Date: 03/03/2000 12:44:36
Thor Lancelot Simon wrote:

> On Thu, Mar 02, 2000 at 08:17:10AM -0800, Paul Hoffman wrote:
> > I made the floppies from the directory, booted, and started a clean install 
> > using the default partitioning and just the non-X sets, getting them with 
> > FTP. This is on a Pentium system with a 13 Gig hard drive. Towards the end 
> > of the tar for base.tgz, it starts kicking out errors of not being able to 
> > set the date and permissions on folders it is creating because / is full. 
> > The directories it is complaining about are created, and they seem to have 
> > perfectly reasonable times on them, but after the base.tgz untarring, it 
> > says errors were encountered.
> 
> This is a totally silly bug related to the use of 'pax' instead of 'tar'
> on the install floppies.  As it extracts, pax writes out a list of file
> times and permissions into a file in /tmp so it can apply them in one
> pass at the end of the extraction.  Unfortunately, this file doesn't fit
> on the MFS's /tmp.

How does the following look for a quick hack?  It adds a ``M'' flag
that says to keep temporary info in memory instead of on disk.  Note
that I've only implemented the directory tmp file and not that file
temp file, which is used for archive creation.  I can either flesh out
that part later if people are happy with this concept, or just mark the
``M'' flag for file extraction only.  I'd also probably tidy up the
list handling if we want to use this - I just wanted to hack something
together as a proof of concept.  I also haven't looked closely at the
illegal option flag sets at the bottom of options.h to see if anything
needs to be done for the new option, and it's only implemented for
pax mode - ``M'' is free for tar, but cpio uses both ``m'' and ``M''
already.

I'm relucatant to just throw away the temporary disk file altogether
since there's a comment in tables.c saying:

	* ... The
	* goal was speed and the ability to work with HUGE archives.

That said, extracting base.tgz from i386 1.4.1 results in a 24437 byte
file in /tmp with info for 441 directories.  The value of HUGE would
obviously be in the hundreds of thousands or millions of directories
with a decent datasize resource limit.


If there are no objections to this, I'll also clear up the icky ANSI/K&R
goop in pax in line with the recently updated style guide...

Simon.
--

Index: extern.h
===================================================================
RCS file: /cvsroot/basesrc/bin/pax/extern.h,v
retrieving revision 1.22
diff -p -u -r1.22 extern.h
--- extern.h	2000/02/17 03:12:23	1.22
+++ extern.h	2000/03/03 01:19:46
@@ -229,6 +229,7 @@ extern int zflag;
 extern int Dflag;
 extern int Hflag;
 extern int Lflag;
+extern int Mflag;
 extern int Xflag;
 extern int Yflag;
 extern int Zflag;
Index: options.c
===================================================================
RCS file: /cvsroot/basesrc/bin/pax/options.c,v
retrieving revision 1.26
diff -p -u -r1.26 options.c
--- options.c	2000/02/17 03:12:25	1.26
+++ options.c	2000/03/03 01:19:47
@@ -209,7 +209,7 @@ pax_options(argc, argv)
 	/*
 	 * process option flags
 	 */
-	while ((c=getopt(argc,argv,"ab:cdf:iklno:p:rs:tuvwx:zAB:DE:G:HLPT:U:XYZ"))
+	while ((c=getopt(argc,argv,"ab:cdf:iklno:p:rs:tuvwx:zAB:DE:G:HLMPT:U:XYZ"))
 	    != -1) {
 		switch (c) {
 		case 'a':
@@ -480,6 +480,14 @@ pax_options(argc, argv)
 			 */
 			Lflag = 1;
 			flg |= CLF;
+			break;
+		case 'M':
+			/*
+			 * store file/directory info in memory instead of on
+			 * disk.
+			 */
+			Mflag = 1;
+			flg |= CMF;
 			break;
 		case 'P':
 			/*
Index: options.h
===================================================================
RCS file: /cvsroot/basesrc/bin/pax/options.h,v
retrieving revision 1.6
diff -p -u -r1.6 options.h
--- options.h	1999/11/01 17:13:27	1.6
+++ options.h	2000/03/03 01:19:47
@@ -82,18 +82,19 @@
 #define	CGF	0x00400000	/* nonstandard extension */
 #define	CHF	0x00800000	/* nonstandard extension */
 #define	CLF	0x01000000	/* nonstandard extension */
-#define	CPF	0x02000000	/* nonstandard extension */
-#define	CTF	0x04000000	/* nonstandard extension */
-#define	CUF	0x08000000	/* nonstandard extension */
-#define	CXF	0x10000000
-#define	CYF	0x20000000	/* nonstandard extension */
-#define	CZF	0x40000000	/* nonstandard extension */
+#define	CMF	0x02000000	/* nonstandard extension */
+#define	CPF	0x04000000	/* nonstandard extension */
+#define	CTF	0x08000000	/* nonstandard extension */
+#define	CUF	0x10000000	/* nonstandard extension */
+#define	CXF	0x20000000
+#define	CYF	0x40000000	/* nonstandard extension */
+#define	CZF	0x80000000	/* nonstandard extension */
 
 /*
  * ascii string indexed by bit position above (alter the above and you must
  * alter this string) used to tell the user what flags caused us to complain
  */
-#define FLGCH	"abcdfiklnoprstuvwxABDEGHLPTUXYZ"
+#define FLGCH	"abcdfiklnoprstuvwxABDEGHLMPTUXYZ"
 
 /*
  * legal pax operation bit patterns
Index: pax.1
===================================================================
RCS file: /cvsroot/basesrc/bin/pax/pax.1,v
retrieving revision 1.22
diff -p -u -r1.22 pax.1
--- pax.1	1999/11/07 15:57:31	1.22
+++ pax.1	2000/03/03 01:19:47
@@ -865,6 +865,9 @@ Follow only command line symbolic links 
 system traversal.
 .It Fl L
 Follow all symbolic links to perform a logical file system traversal.
+.It Fl M
+Store directory and file information in memory instead of in a temporary
+file in TMPDIR.
 .It Fl P
 Do not follow symbolic links, perform a physical file system traversal.
 This is the default mode.
Index: pax.c
===================================================================
RCS file: /cvsroot/basesrc/bin/pax/pax.c,v
retrieving revision 1.12
diff -p -u -r1.12 pax.c
--- pax.c	2000/02/17 03:12:25	1.12
+++ pax.c	2000/03/03 01:19:47
@@ -88,6 +88,7 @@ int	Aflag;			/* honor absolute path */
 int	Dflag;			/* same as uflag except inode change time */
 int	Hflag;			/* follow command line symlinks (write only) */
 int	Lflag;			/* follow symlinks when writing */
+int	Mflag;			/* store directory/file info in memory */
 int	Xflag;			/* archive files with same device id only */
 int	Yflag;			/* same as Dflg except after name mode */
 int	Zflag;			/* same as uflg except after name mode */
Index: tables.c
===================================================================
RCS file: /cvsroot/basesrc/bin/pax/tables.c,v
retrieving revision 1.12
diff -p -u -r1.12 tables.c
--- tables.c	2000/02/17 03:12:26	1.12
+++ tables.c	2000/03/03 01:19:47
@@ -83,6 +83,7 @@ static NAMT **ntab = NULL;	/* interactiv
 static DEVT **dtab = NULL;	/* device/inode mapping tables */
 static ATDIR **atab = NULL;	/* file tree directory time reset table */
 static int dirfd = -1;		/* storage for setting created dir time/mode */
+static DIRDATA *dirtab = NULL;	/* created dir time/mode memory storage */
 static u_long dircnt;		/* entries in dir time/mode storage */
 static int ffd = -1;		/* tmp file for file time table name storage */
 
@@ -388,7 +389,10 @@ ftime_start()
 		return(-1);
 	}
 
+#ifdef SNARK
 	(void)unlink(template);
+#endif
+	printf("unlink in ftime_start\n");
 	return(0);
 }
 
@@ -1218,25 +1222,39 @@ int
 dir_start()
 #endif
 {
-	const char *tmpdir;
-	char template[MAXPATHLEN];
 
-	if (dirfd != -1)
-		return(0);
+	if (Mflag) {
+		if (dirtab != NULL)
+			return(0);
 
-	/*
-	 * unlink the file so it goes away at termination by itself
-	 */
-	if ((tmpdir = getenv("TMPDIR")) == NULL)
-		tmpdir = _PATH_TMP;
-	(void)snprintf(template, sizeof(template), "%s/%s", tmpdir, TMPFILE);
-	if ((dirfd = mkstemp(template)) >= 0) {
-		(void)unlink(template);
-		return(0);
+		if ((dirtab = (DIRDATA *)malloc(sizeof(DIRDATA))) != NULL) {
+			dirtab->name = NULL;	/* make last record */
+			return(0);
+		}
+		tty_warn(1, "Unable to allocate memory for directory times");
+		return(-1);
+
+	} else {
+		const char *tmpdir;
+		char template[MAXPATHLEN];
+
+		if (dirfd != -1)
+			return(0);
+
+		/*
+		 * unlink the file so it goes away at termination by itself
+		 */
+		if ((tmpdir = getenv("TMPDIR")) == NULL)
+			tmpdir = _PATH_TMP;
+		(void)snprintf(template, sizeof(template), "%s/%s", tmpdir, TMPFILE);
+		if ((dirfd = mkstemp(template)) >= 0) {
+			(void)unlink(template);
+			return(0);
+		}
+		tty_warn(1, "Unable to create temporary file for directory times: %s",
+		    template);
+		return(-1);
 	}
-	tty_warn(1, "Unable to create temporary file for directory times: %s",
-	    template);
-	return(-1);
 }
 
 /*
@@ -1264,38 +1282,65 @@ add_dir(name, nlen, psb, frc_mode)
 	int frc_mode;
 #endif
 {
-	DIRDATA dblk;
 
-	if (dirfd < 0)
-		return;
+	if (Mflag) {
+		DIRDATA *dptr;
 
-	/*
-	 * get current position (where file name will start) so we can store it
-	 * in the trailer
-	 */
-	if ((dblk.npos = lseek(dirfd, 0L, SEEK_CUR)) < 0) {
+		if (dirtab == NULL)
+			return;
+
+		if ((dptr = (DIRDATA *)malloc(sizeof(DIRDATA))) != NULL) {
+			if ((dptr->name = strdup(name)) != NULL) {
+				dptr->mode = psb->st_mode & 0xffff;
+				dptr->mtime = psb->st_mtime;
+				dptr->atime = psb->st_atime;
+				dptr->fflags = psb->st_flags;
+				dptr->frc_mode = frc_mode;
+				dptr->fow = dirtab;
+				dirtab = dptr;
+				return;
+			}
+		}
 		tty_warn(1,
-		    "Unable to store mode and times for directory: %s",name);
-		return;
-	}
+		    "Unable to store mode and times for created directory: %s",
+		    name);
 
-	/*
-	 * write the file name followed by the trailer
-	 */
-	dblk.nlen = nlen + 1;
-	dblk.mode = psb->st_mode & 0xffff;
-	dblk.mtime = psb->st_mtime;
-	dblk.atime = psb->st_atime;
-	dblk.fflags = psb->st_flags;
-	dblk.frc_mode = frc_mode;
-	if ((xwrite(dirfd, name, dblk.nlen) == dblk.nlen) &&
-	    (xwrite(dirfd, (char *)&dblk, sizeof(dblk)) == sizeof(dblk))) {
-		++dircnt;
-		return;
-	}
+	} else {
+		DIRDATA dblk;
 
-	tty_warn(1,
-	    "Unable to store mode and times for created directory: %s",name);
+		if (dirfd < 0)
+			return;
+
+		/*
+		 * get current position (where file name will start) so we can
+		 * store it in the trailer
+		 */
+		if ((dblk.npos = lseek(dirfd, 0L, SEEK_CUR)) < 0) {
+			tty_warn(1,
+			    "Unable to store mode and times for directory: %s",
+			    name);
+			return;
+		}
+
+		/*
+		 * write the file name followed by the trailer
+		 */
+		dblk.nlen = nlen + 1;
+		dblk.mode = psb->st_mode & 0xffff;
+		dblk.mtime = psb->st_mtime;
+		dblk.atime = psb->st_atime;
+		dblk.fflags = psb->st_flags;
+		dblk.frc_mode = frc_mode;
+		if ((xwrite(dirfd, name, dblk.nlen) == dblk.nlen) &&
+		    (xwrite(dirfd, (char *)&dblk, sizeof(dblk)) == sizeof(dblk))) {
+			++dircnt;
+			return;
+		}
+
+		tty_warn(1,
+		    "Unable to store mode and times for created directory: %s",
+		    name);
+	}
 	return;
 }
 
@@ -1313,48 +1358,75 @@ void
 proc_dir()
 #endif
 {
-	char name[PAXPATHLEN+1];
-	DIRDATA dblk;
-	u_long cnt;
+	if (Mflag) {
+		DIRDATA *dptr;
 
-	if (dirfd < 0)
-		return;
-	/*
-	 * read backwards through the file and process each directory
-	 */
-	for (cnt = 0; cnt < dircnt; ++cnt) {
+		if (dirtab == NULL)
+			return;
 		/*
-		 * read the trailer, then the file name, if this fails
-		 * just give up.
+		 * traverse list and process each directory
 		 */
-		if (lseek(dirfd, -((off_t)sizeof(dblk)), SEEK_CUR) < 0)
-			break;
-		if (xread(dirfd,(char *)&dblk, sizeof(dblk)) != sizeof(dblk))
-			break;
-		if (lseek(dirfd, dblk.npos, SEEK_SET) < 0)
-			break;
-		if (xread(dirfd, name, dblk.nlen) != dblk.nlen)
-			break;
-		if (lseek(dirfd, dblk.npos, SEEK_SET) < 0)
-			break;
+		dptr = dirtab;
+		while (dptr != NULL) {
+			if (dptr->name == NULL)
+				break;	/* last record */
+			/*
+			 * frc_mode set, make sure we set the file modes even if
+			 * the user didn't ask for it (see file_subs.c for more info)
+			 */
+			if (pmode || dptr->frc_mode)
+				set_pmode(dptr->name, dptr->mode);
+			if (patime || pmtime)
+				set_ftime(dptr->name, dptr->mtime, dptr->atime, 0);
+			if (pfflags)
+				set_chflags(dptr->name, dptr->fflags);
+
+			dptr = dptr->fow;
+		}
+	} else {
+		char name[PAXPATHLEN+1];
+		DIRDATA dblk;
+		u_long cnt;
 
+		if (dirfd < 0)
+			return;
 		/*
-		 * frc_mode set, make sure we set the file modes even if
-		 * the user didn't ask for it (see file_subs.c for more info)
+		 * read backwards through the file and process each directory
 		 */
-		if (pmode || dblk.frc_mode)
-			set_pmode(name, dblk.mode);
-		if (patime || pmtime)
-			set_ftime(name, dblk.mtime, dblk.atime, 0);
-		if (pfflags)
-			set_chflags(name, dblk.fflags);
-	}
+		for (cnt = 0; cnt < dircnt; ++cnt) {
+			/*
+			 * read the trailer, then the file name, if this fails
+			 * just give up.
+			 */
+			if (lseek(dirfd, -((off_t)sizeof(dblk)), SEEK_CUR) < 0)
+				break;
+			if (xread(dirfd,(char *)&dblk, sizeof(dblk)) != sizeof(dblk))
+				break;
+			if (lseek(dirfd, dblk.npos, SEEK_SET) < 0)
+				break;
+			if (xread(dirfd, name, dblk.nlen) != dblk.nlen)
+				break;
+			if (lseek(dirfd, dblk.npos, SEEK_SET) < 0)
+				break;
 
-	(void)close(dirfd);
-	dirfd = -1;
-	if (cnt != dircnt)
-		tty_warn(1,
-		    "Unable to set mode and times for created directories");
+			/*
+			 * frc_mode set, make sure we set the file modes even if
+			 * the user didn't ask for it (see file_subs.c for more info)
+			 */
+			if (pmode || dblk.frc_mode)
+				set_pmode(name, dblk.mode);
+			if (patime || pmtime)
+				set_ftime(name, dblk.mtime, dblk.atime, 0);
+			if (pfflags)
+				set_chflags(name, dblk.fflags);
+		}
+
+		(void)close(dirfd);
+		dirfd = -1;
+		if (cnt != dircnt)
+			tty_warn(1,
+			    "Unable to set mode and times for created directories");
+	}
 	return;
 }
 
Index: tables.h
===================================================================
RCS file: /cvsroot/basesrc/bin/pax/tables.h,v
retrieving revision 1.5
diff -p -u -r1.5 tables.h
--- tables.h	2000/02/17 03:12:26	1.5
+++ tables.h	2000/03/03 01:19:47
@@ -162,14 +162,18 @@ typedef struct atdir {
  * We MUST reset times from leaf to root (it will not work the other
  * direction).  Entries are recorded into a spool file to make reverse
  * reading faster.
+ *
+ * With the -M flag, this list is stored in memory instead of on disk.
  */
 
 typedef struct dirdata {
-	int nlen;	/* length of the directory name (includes \0) */
-	off_t npos;	/* position in file where this dir name starts */
-	mode_t mode;	/* file mode to restore */
-	time_t mtime;	/* mtime to set */
-	time_t atime;	/* atime to set */
-	long fflags;	/* file flags to set */
-	int frc_mode;	/* do we force mode settings? */
+	char *name;		/* file name (used with memory storage) */
+	int nlen;		/* length of the directory name (includes \0) */
+	off_t npos;		/* position in file where this dir name starts */
+	mode_t mode;		/* file mode to restore */
+	time_t mtime;		/* mtime to set */
+	time_t atime;		/* atime to set */
+	long fflags;		/* file flags to set */
+	int frc_mode;		/* do we force mode settings? */
+	struct dirdata *fow;
 } DIRDATA;