"Volkmar Seifert" <vs%nifelheim.info@localhost> writes:
>> The rule of thumb seems to be 2x number of CPU cores.
>
> (2x Number of CPU-Cores) + 1 to be exact, to have one job queued and
> ready, as soon as a current one is finished.
Have you tried controlled experiments with the various numbers? If
so, could you post what you did (including how you ensured the
filesystem caches were in the same state before each run) and what
the results were?
While I see the point of 2N+1, there are a lot of complex interactions.
My memory is very fuzzy, but I remember -j3 being slightly faster than
-j4 on Core Duo, but not enough to worry about which value is used. I
think that might be because 3 is enough to usually have a job ready when
one finishes, or to run during pauses to read files, and that once one
has enough concurrency further jobs result in more rescheduling events
and cache misses. If we had a scheduler that had jobs marked as bulk
and as part of a -j4 run and could run the newer ones only if the older
ones weren't runnable, then higher -j values might be even more
effective.