Subject: Re: Was: Re: Cloning /dev/wd0 Now: rsync
To: None <netbsd-help@netbsd.org>
From: Christos Zoulas <christos@zoulas.com>
List: netbsd-help
Date: 12/10/2002 18:14:44
In article <200212101641.gBAGfIT09021@grok.beer.org>,
Herb Peyerl <hpeyerl@beer.org> wrote:
>Urban Boquist <urban@boquist.net> wrote:
> > >>>>> Herb Peyerl writes:
> >
> > Herb> You can try to use rsync to keep the filesystems sync'd up but
> > Herb> I've never had rsync work reliably; it's horribly broken.
> >
> > Huh? Would you mind elaborating a bit on that?
> >
> > I've used rsync for many years both to mirror disks and as a backup
> > tool and it usually works Just Great for me. When it comes to saving
> > bandwidth, the "rsync algorithm" is far superior to anything else that
> > I've ever seen. Granted, you need to understand what you are doing,
> > and use the correct command line flags, but saying that rsync is
> > "horribly broken" is unfair, to say the least, IMHO.
>
>I've had this discussion dozens of times with other talented and
>knowledgeable people over the years and over various versions of
>rsync. Lukem even challenged me once to prove it to him. So we
>ran 'rsync -vaxH' of a /usr/local to another disk on the same
>machine and it merrily chugged away for several hours and then
>suddenly came to a stop partway through, leaving 2 Zombie children
>with a parent spin-waiting on nothing particularly of interest.
>Sometimes it takes a few runs to do this. I have some machines
>that rsync to a /backup volume and they will dutifully work for
>weeks and then I'll wake up one morning to find that they've
>wedged again and I have to kill them off. There are no hardware
>problems on the machines in question and, as they're live web-
>servers serving pages for a european auto manufacturer, I don't
>suspect there are any lingering and unknown hardware problems.
>
>Then of course, sometimes rsync will just wedge like this for
>days before I finally have to kill them... I started this
>rsync before starting to reply to this message... It happily
>ran for the first paragraph and then promptly wedged:
>
>[grok hpeyerl 6 ]; ps -ax | grep rsync
> 8707 p5 S+ 0:24.94 rsync -vaxH . /backup/home/
> 8708 p5 S+ 0:23.53 rsync -vaxH . /backup/home/
> 8709 p5 S+ 0:20.99 rsync -vaxH . /backup/home/
>[grok hpeyerl 7 ]; !!
>ps -ax | grep rsync
> 8707 p5 S+ 0:24.94 rsync -vaxH . /backup/home/
> 8708 p5 S+ 0:23.53 rsync -vaxH . /backup/home/
> 8709 p5 S+ 0:20.99 rsync -vaxH . /backup/home/
>[grok hpeyerl 8 ]; !!
>ps -ax | grep rsync
> 8707 p5 S+ 0:24.94 rsync -vaxH . /backup/home/
> 8708 p5 S+ 0:23.53 rsync -vaxH . /backup/home/
> 8709 p5 S+ 0:20.99 rsync -vaxH . /backup/home/
> 8812 p8 RV 0:00.00 grep rsync (csh)
>[grok hpeyerl 9 ]; !!
>ps -ax | grep rsync
> 8707 p5 S+ 0:24.94 rsync -vaxH . /backup/home/
> 8708 p5 S+ 0:23.53 rsync -vaxH . /backup/home/
> 8709 p5 S+ 0:20.99 rsync -vaxH . /backup/home/
>
>
>You'll note the conspicuous lack of any sort of progress over
>the space of a minute. It does accrue cpu usage over the
>progress of time but isn't actually accomplishing anything. A
>'ktrace -p' shows it's just doing this:
>
> 8707 rsync CALL gettimeofday(0xbfbfd8ac,0)
> 8707 rsync RET gettimeofday 0
> 8707 rsync CALL gettimeofday(0xbfbfd8a4,0)
> 8707 rsync RET gettimeofday 0
> 8707 rsync CALL select(0,0,0,0,0xbfbfd89c)
> 8707 rsync RET select 0
> 8707 rsync CALL gettimeofday(0xbfbfd8a4,0)
> 8707 rsync RET gettimeofday 0
> 8707 rsync CALL wait4(0x2204,0xbfbfd8e4,0x1,0)
Ok, that is the parent waiting for the child to finish
In this case pid 0x2204 = 8708. What is the child doing,
and not finishing?
christos