NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: distcc for pkgsrc issue



Swift Griggs <swiftgriggs%gmail.com@localhost> writes:

> On Fri, 7 Jul 2017, John Halfpenny wrote:
>> Just an update for posterity that I resolved this issue.
>
> Interesting. The wrapper script idea reminds me of another question
> about distcc and friends. I've noticed that some packages complain
> with great aggrevation about my use of "make -jX" where X=CPUs.

Presumably you mean "build fails in ways that are hard to reproduce, and
we think it's because of makefile bugs where dependencies that actually
exist are not expressed in the makefile, but with make -j1 no one
notices"?  If so, yes, that's how it is...

> The question is, since folks make heavy use of distcc, does it have
> the same limitations and when you hit a compilation error related to a
> parallel compiler run, is that how the "dont-use-parallel-make"
> warnings get there, or are the mechanics of 'make -j8' and distcc so
> different that errors in one doesn't mean problems with the other?

They are basically orthogonal, except that actually hitting a "make -j"
bug is probabalistic.

-j8 lets make have 8 jobs running, and doesn't change the compiler.

distcc says that instead of calling cc locally, one essentially does rsh*
to some other box to run cc, after sending the input, and then gets the
output.  There is no explicit relationship to job number.

* I actually use ssh with a control socket and a 'ssh target sleep
  86400" running, so the subsequent ssh commands are fast.

Overall, it's good to keep all cpus busy without hammering the disk any
more than you need to, so I tend to use -jN where N is 1.5x the cpus,
for local.  With distcc, there is latency shipping the jobs back and
forth, so I tend to go even higher, perhaps -j12 when I am using (only)
a remote 4-core machine.  Basically I recommend looking at CPU
utilization on the build box when compiling something that is really
parallelizable and finding the smallest -jN value that results in
sustained 90%+ or so loading, avoiding driving the load average to more
than about 1.5x the CPU count.  Yes, I know that's very handwavy.

With timing and number of jobs different, I would expect a different
subset of latent bugs to actually show up.

> The reason I'm asking is occasionally I'd use distcc to get a few of
> my faster NetBSD boxes compiling things like QT or Firefox in
> something less than 60 minutes. It's just that I've never set it up
> because of the many failures I've had trying to use make -j ...

I hope you are using

MAKE_JOBS=8

rather than something else.  Assuming so, note that there is a package
variable "MAKE_JOBS_SAFE", which is supposed to be unset normally and
"no" when a package is known to have a bug building with more than one
job (regardless of whether we know what the bug is).

So if you find something that fails, try with different -j values,
especially 1, and also try restarting the build after failure.  If you
can convince yourself there's make-j bug, post your logic and someone
can stick in MAKE_JOBS_SAFE=no to that package.

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index