Subject: Re: du ouput query
To: None <port-i386@netbsd.org>
From: Ray Phillips <r.phillips@jkmrc.com>
List: port-i386
Date: 07/31/2007 14:00:59
Thanks for replying to my question Juergen and der Mouse.
It turned out the files in the directory are sparse [1] and rsync
created normal files when it copied them. I've appended the output
of some commands I used to investigate to this message if you're
interested.
I discovered rsync has a --sparse option and when I used it du gave
the same value for both the rsync'ed and untar'ed directories.
I suppose it's possible for the amount of disk space allocated to a
sparse file to be the same as its size, and ls -lsh would give the
same numbers for its first and sixth fields? In that case, how do
you tell if it is a sparse file or not? Maybe it ceases to be a
sparse file as soon as this happens?
Changing subject slightly... man ls (1) says:
-l (The lowercase letter ``ell''). List in long format. (See
below.) A total sum for all the file sizes is output on a line
before the long listing.
In this example, how does 'total 4' correspond to the file size of seven bytes?
$ ls -l
total 4
-rw-r--r-- 1 defang wheel 7 Jul 31 13:49 file1
$ set | grep BLOCK
$
Sorry, I'm sure I should know that already.
Ray
[1] I assume they are all sparse.
I tried copying the directory again...
$ pwd
/var/spool/MD-Bayes/tar
$ ls -l
total 870532
drwx------ 25 defang wheel 512 Jul 27 13:19 DB
-rw-r--r-- 1 defang wheel 445573120 Jul 31 10:09 ck-DB.tar
$ ls -l ..
total 8
drwx------ 25 defang wheel 512 Jul 27 13:19 DB
drwxr-xr-x 3 defang wheel 512 Jul 31 10:11 tar
$ du -sh DB
357M DB
$ du -sh ../DB
426M ../DB
$ find DB -type f | wc -l ; find DB -type d | wc -l
78
81
$ find ../DB -type f | wc -l ; find ../DB -type d | wc -l
78
81
$ diff -r DB ../DB
$
Check if all files are the same type:
$ find DB -type f | grep -v '.db$'
$ file DB/*.db
DB/bayes.db: TTComp archive data
$ find DB -type f -exec file {} \; | grep -v 'TTComp archive data'
$
There are no files hard linked together (I ran this command on both
the source machine and on the target machine on the untar'ed copy
directory):
$ pwd
/var/spool/MD-Bayes/tar
$ ls -lR DB | egrep '^-' | awk '$2 != 1'
$
For each file, extract number of blocks used, file size and name of
file, sorting by file name:
$ ls -lasR DB | egrep '\-r' | awk '{print $1 "\t" $6 "\t" $10}' | \
> sort -k3,3 > las-tar
$ ls -lasR ../DB | egrep '\-r' | awk '{print $1 "\t" $6 "\t" $10}' | \
> sort -k3,3 > las-rsync
$ ls -l
total 870540
drwx------ 25 defang wheel 512 Jul 27 13:19 DB
-rw-r--r-- 1 defang wheel 445573120 Jul 31 10:09 ck-DB.tar
-rw-r--r-- 1 defang wheel 1722 Jul 31 10:40 las-rsync
-rw-r--r-- 1 defang wheel 1722 Jul 31 10:38 las-tar
$ wc -l las*
78 las-rsync
78 las-tar
156 total
$ tail -5 las*
==> las-rsync <==
128 65536 ttt6.db
384 196608 tuanl.db
128 65536 weiranz.db
224 114688 williamc.db
5152 2621440 ying.db
==> las-tar <==
128 65536 ttt6.db
352 196608 tuanl.db
128 65536 weiranz.db
192 114688 williamc.db
5024 2621440 ying.db
$
Check that each line has three fields:
$ cat las* | awk 'NF != 3'
$
that file names are the same in both directories:
$ cat las* | awk '{print $3}' | sort | uniq -c | sed 's/ *//' | grep -v '^2'
$
and that the file sizes are the same:
$ paste las* | awk '$2 != $5'
$
See which files have different block sizes:
$ paste las-rsync las-tar | \
> awk '$1 != $4 {print $1 "\t" $4 "\t" $3}' > block-diffs
$ wc -l block-diffs
60 block-diffs
$ tail -5 block-diffs
5120 4224 suzy.db
2592 2176 tonik.db
384 352 tuanl.db
224 192 williamc.db
5152 5024 ying.db
$