Subject: Re: Was: Re: Cloning /dev/wd0 Now: rsync
To: None <netbsd-help@netbsd.org>
From: Christos Zoulas <christos@zoulas.com>
List: netbsd-help
Date: 12/10/2002 18:14:44
In article <200212101641.gBAGfIT09021@grok.beer.org>,
Herb Peyerl <hpeyerl@beer.org> wrote:
>Urban Boquist <urban@boquist.net>  wrote:
> > >>>>> Herb Peyerl writes:
> > 
> > Herb> You can try to use rsync to keep the filesystems sync'd up but
> > Herb> I've never had rsync work reliably; it's horribly broken.
> > 
> > Huh? Would you mind elaborating a bit on that?
> > 
> > I've used rsync for many years both to mirror disks and as a backup
> > tool and it usually works Just Great for me. When it comes to saving
> > bandwidth, the "rsync algorithm" is far superior to anything else that
> > I've ever seen. Granted, you need to understand what you are doing,
> > and use the correct command line flags, but saying that rsync is
> > "horribly broken" is unfair, to say the least, IMHO.
>
>I've had this discussion dozens of times with other talented and
>knowledgeable people over the years and over various versions of
>rsync.  Lukem even challenged me once to prove it to him. So we 
>ran 'rsync -vaxH' of a /usr/local to another disk on the same 
>machine and it merrily chugged  away for several hours and then
>suddenly came to a stop partway through, leaving 2 Zombie children
>with a parent spin-waiting on nothing particularly of interest.
>Sometimes it takes a few runs to do this.  I have some machines
>that rsync to a /backup volume and they will dutifully work for
>weeks and then I'll wake up one morning to find that they've 
>wedged  again and I have to kill them off.  There are no hardware
>problems on the machines in question and, as they're live web-
>servers serving pages for a european auto manufacturer, I don't
>suspect there are any lingering and unknown hardware problems.
>
>Then of course, sometimes rsync will just wedge like this for
>days before I finally have to kill them... I started this
>rsync before starting to reply to this message... It happily
>ran for the first paragraph and then promptly wedged:
>
>[grok hpeyerl 6 ]; ps -ax | grep rsync
> 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
> 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
> 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 
>[grok hpeyerl 7 ]; !!
>ps -ax | grep rsync
> 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
> 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
> 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 
>[grok hpeyerl 8 ]; !!
>ps -ax | grep rsync
> 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
> 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
> 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 
> 8812 p8 RV    0:00.00 grep rsync (csh)
>[grok hpeyerl 9 ]; !!
>ps -ax | grep rsync
> 8707 p5 S+    0:24.94 rsync -vaxH . /backup/home/ 
> 8708 p5 S+    0:23.53 rsync -vaxH . /backup/home/ 
> 8709 p5 S+    0:20.99 rsync -vaxH . /backup/home/ 
>
>
>You'll note the conspicuous lack of any sort of progress over 
>the space of a minute.  It does accrue cpu usage over the
>progress of time but isn't actually accomplishing anything. A
>'ktrace -p' shows it's just doing this:
>
>  8707 rsync    CALL  gettimeofday(0xbfbfd8ac,0)
>  8707 rsync    RET   gettimeofday 0
>  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
>  8707 rsync    RET   gettimeofday 0
>  8707 rsync    CALL  select(0,0,0,0,0xbfbfd89c)
>  8707 rsync    RET   select 0
>  8707 rsync    CALL  gettimeofday(0xbfbfd8a4,0)
>  8707 rsync    RET   gettimeofday 0
>  8707 rsync    CALL  wait4(0x2204,0xbfbfd8e4,0x1,0)

Ok, that is the parent waiting for the child to finish
In this case pid 0x2204 = 8708. What is the child doing,
and not finishing?

christos