Subject: Re: du ouput query
To: None <port-i386@netbsd.org>
From: Ray Phillips <r.phillips@jkmrc.com>
List: port-i386
Date: 07/31/2007 14:00:59
Thanks for replying to my question Juergen and der Mouse.

It turned out the files in the directory are sparse [1] and rsync 
created normal files when it copied them.  I've appended the output 
of some commands I used to investigate to this message if you're 
interested.

I discovered rsync has a --sparse option and when I used it du gave 
the same value for both the rsync'ed and untar'ed directories.

I suppose it's possible for the amount of disk space allocated to a 
sparse file to be the same as its size, and ls -lsh would give the 
same numbers for its first and sixth fields?  In that case, how do 
you tell if it is a sparse file or not?  Maybe it ceases to be a 
sparse file as soon as this happens?

Changing subject slightly... man ls (1) says:

      -l      (The lowercase letter ``ell'').  List in long format.  (See
              below.)  A total sum for all the file sizes is output on a line
              before the long listing.

In this example, how does 'total 4' correspond to the file size of seven bytes?

$ ls -l
total 4
-rw-r--r--  1 defang  wheel  7 Jul 31 13:49 file1
$ set | grep BLOCK
$

Sorry, I'm sure I should know that already.


Ray



[1]  I assume they are all sparse.



I tried copying the directory again...

$ pwd
/var/spool/MD-Bayes/tar
$ ls -l
total 870532
drwx------  25 defang  wheel        512 Jul 27 13:19 DB
-rw-r--r--   1 defang  wheel  445573120 Jul 31 10:09 ck-DB.tar
$ ls -l ..
total 8
drwx------  25 defang  wheel  512 Jul 27 13:19 DB
drwxr-xr-x   3 defang  wheel  512 Jul 31 10:11 tar
$ du -sh DB
357M    DB
$ du -sh ../DB
426M    ../DB
$ find DB -type f | wc -l ; find DB -type d | wc -l
       78
       81
$ find ../DB -type f | wc -l ; find ../DB -type d | wc -l
       78
       81
$ diff -r DB ../DB
$

Check if all files are the same type:

$ find DB -type f | grep -v '.db$'
$ file DB/*.db
DB/bayes.db: TTComp archive data
$ find DB -type f -exec file {} \; | grep -v 'TTComp archive data'
$

There are no files hard linked together (I ran this command on both 
the source machine and on the target machine on the untar'ed copy 
directory):

$ pwd
/var/spool/MD-Bayes/tar
$ ls -lR DB | egrep '^-' | awk '$2 != 1'
$

For each file, extract number of blocks used, file size and name of 
file, sorting by file name:

$ ls -lasR DB | egrep '\-r' | awk '{print $1 "\t" $6 "\t" $10}' | \
>  sort -k3,3 > las-tar
$ ls -lasR ../DB | egrep '\-r' | awk '{print $1 "\t" $6 "\t" $10}' | \
>  sort -k3,3 > las-rsync
$ ls -l
total 870540
drwx------  25 defang  wheel        512 Jul 27 13:19 DB
-rw-r--r--   1 defang  wheel  445573120 Jul 31 10:09 ck-DB.tar
-rw-r--r--   1 defang  wheel       1722 Jul 31 10:40 las-rsync
-rw-r--r--   1 defang  wheel       1722 Jul 31 10:38 las-tar
$ wc -l las*
       78 las-rsync
       78 las-tar
      156 total
$ tail -5 las*
==> las-rsync <==
128     65536   ttt6.db
384     196608  tuanl.db
128     65536   weiranz.db
224     114688  williamc.db
5152    2621440 ying.db

==> las-tar <==
128     65536   ttt6.db
352     196608  tuanl.db
128     65536   weiranz.db
192     114688  williamc.db
5024    2621440 ying.db
$

Check that each line has three fields:

$ cat las* | awk 'NF != 3'
$

that file names are the same in both directories:

$ cat las* | awk '{print $3}' | sort | uniq -c | sed 's/  *//' | grep -v '^2'
$

and that the file sizes are the same:

$ paste las* | awk '$2 != $5'
$

See which files have different block sizes:

$ paste las-rsync las-tar | \
>  awk '$1 != $4 {print $1 "\t" $4 "\t" $3}' > block-diffs
$ wc -l block-diffs
       60 block-diffs
$ tail -5 block-diffs
5120    4224    suzy.db
2592    2176    tonik.db
384     352     tuanl.db
224     192     williamc.db
5152    5024    ying.db
$