Subject: Was: Re: Cloning /dev/wd0 Now: rsync
To: Urban Boquist <urban@boquist.net>
From: Herb Peyerl <hpeyerl@beer.org>
List: netbsd-help
Date: 12/10/2002 09:41:17
Urban Boquist <urban@boquist.net>  wrote:
 > >>>>> Herb Peyerl writes:
 > 
 > Herb> You can try to use rsync to keep the filesystems sync'd up but
 > Herb> I've never had rsync work reliably; it's horribly broken.
 > 
 > Huh? Would you mind elaborating a bit on that?
 > 
 > I've used rsync for many years both to mirror disks and as a backup
 > tool and it usually works Just Great for me. When it comes to saving
 > bandwidth, the "rsync algorithm" is far superior to anything else that
 > I've ever seen. Granted, you need to understand what you are doing,
 > and use the correct command line flags, but saying that rsync is
 > "horribly broken" is unfair, to say the least, IMHO.

I've had this discussion dozens of times with other talented and
knowledgeable people over the years and over various versions of
rsync.  Lukem even challenged me once to prove it to him. So we 
ran 'rsync -vaxH' of a /usr/local to another disk on the same 
machine and it merrily chugged  away for several hours and then
suddenly came to a stop partway through, leaving 2 Zombie children
with a parent spin-waiting on nothing particularly of interest.
Sometimes it takes a few runs to do this.  I have some machines
that rsync to a /backup volume and they will dutifully work for
weeks and then I'll wake up one morning to find that they've 
wedged  again and I have to kill them off.  There are no hardware
problems on the machines in question and, as they're live web-
servers serving pages for a european auto manufacturer, I don't
suspect there are any lingering and unknown hardware problems.

Then of course, sometimes rsync will just wedge like this for
days before I finally have to kill them... I started this
rsync before starting to reply to this message... It happily
ran for the first paragraph and then promptly wedged:

[grok hpeyerl 6 ]; ps -ax | grep rsync
 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 
[grok hpeyerl 7 ]; !!
ps -ax | grep rsync
 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 
[grok hpeyerl 8 ]; !!
ps -ax | grep rsync
 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 
 8812 p8 RV    0:00.00 grep rsync (csh)
[grok hpeyerl 9 ]; !!
ps -ax | grep rsync
 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 


You'll note the conspicuous lack of any sort of progress over 
the space of a minute.  It does accrue cpu usage over the
progress of time but isn't actually accomplishing anything. A
'ktrace -p' shows it's just doing this:

  8707 rsync    CALL  gettimeofday(0xbfbfd8ac,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  select(0,0,0,0,0xbfbfd89c)
  8707 rsync    RET   select 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  wait4(0x2204,0xbfbfd8e4,0x1,0)
  8707 rsync    RET   wait4 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8ac,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  select(0,0,0,0,0xbfbfd89c)
  8707 rsync    RET   select 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  wait4(0x2204,0xbfbfd8e4,0x1,0)
  8707 rsync    RET   wait4 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8ac,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  select(0,0,0,0,0xbfbfd89c)
  8707 rsync    RET   select 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  wait4(0x2204,0xbfbfd8e4,0x1,0)
  8707 rsync    RET   wait4 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8ac,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
  8707 rsync    RET   gettimeofday 0
  8707 rsync    CALL  select(0,0,0,0,0xbfbfd89c)
    .
    .
    .


The last file that was copied was a ~/.ssh/authorized_keys

Check back tomorrow and I bet it will still be running.

this is on i386, 1.6, and rsync 2.4.6 built out of pkgsrc. 
/home and /backup are normal FFS and there's no NFS involved
here anywhere...

So, unless you can point to something I'm doing that's obviously
wrong, I posit that rsync is a horribly broken POS.