Subject: Re: dump and nodump flag
To: Brian C. Grayson <bgrayson@marvin.ece.utexas.edu>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 03/07/1999 16:51:18
On Mar 3, Brian C. Grayson wrote
>   Hm.  Is there any reason we can't do pass 1 and 2 (including
> the nodump stuff) at the same time?  Correct me if I'm wrong on
> any of this:
> 
> Pass 1:  Find all inodes that need to be dumped (modified since
>     last dump).
> 
> Pass 2:  Look at each directory, and figure out if it contains
>     any files flagged in pass 1.
> 
>   Couldn't we just use a recursive traversal of the (possibly
> unmounted) FS, and do it all, including checking for nodump
> flags, all in one pass?
> 
> Pseudo-code:
> DumpPass1And2(inode) {  /*  DumpPass1And2 returns the size
> 			    estimate for this and any children inodes.  */
>   if (inode is a directory) {
>     if not dumpable due to nodump flag, return 0;
>     otherwise, call DumpPass1And2 for each child and add up return values.
>     if (return_sum == 0) return 0;  /*  We don't need to be dumped either.  */
>     set bit for inode.
>     set bit for directory.
>     return return_sum+estimate for myself;
>   }
>   /* else, it's a file.  */
>   if not dumpable, return 0;
>   set bit for inode
>   return estimate;
> }
> 
> /*  The tape estimate is easy to calculate.  :)  */
> tape_estimate = DumpPass1And2(rootino);
> 
>   In fact, a traversal, with the new nodump mods, would
> actually perform fewer disk accesses, as the inodes for files
> under a nodump directory would never be examined.  It would not
> necessarily walk over the inodes in order, though, so maybe
> it's a wash from the performance point of view.  But I think it
> would be much cleaner code!  And many fewer lines, too!
> 
>   One other advantage to going to a cleanly-written
> traversal-style method is, it would be nearly trivial to write
> dump_ext2fs and dump_lfs (not to be confused with the existing
> dumplfs, which is really more like lfsdb), and maybe even
> dump_nfs, dump_msdos, dump_mfs, dump_ados.
> 

Well, this has the usual problem of recursive code: memory usage on
the stack can become very higth, which can be a problem for low-memory
machines. The second problem is that inodes are not read in order,
which can make things considerably slower. Maybe both can be solved
with an intelligent caching algorithm, but this is not an easy task.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--