tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: high sys time, very very slow builds on new 24-core system

On Wed, Mar 23, 2011 at 05:24:12PM -0400, Thor Lancelot Simon wrote:

> I have a new machine with 24 2Ghz Opteron cores.  It has 32GB of RAM.
> Building with sources on a fast SSD ("preloaded" into the page
> cache before the build using tar > /dev/null) and obj, dest, and rel
> dirs on tmpfs, system builds are extraordinarily slow.  The system
> takes about 20 minutes to build a netbsd-5 based source tree
> with -j24 -- about the same amount of time as an older 8-core Intel
> based system running netbsd-5 requires with -j8.
> All cores spend well over 50% time in 'sys', even when all or almost
> all are running cc1 processes.  The kernel is amd64 -current GENERIC
> from about 1 week ago -- no DIAGNOSTIC, DEBUG, KMEMSTATS, LOCKDEBUG,
> etc.
> Does anyone have any idea what might be wrong here?

Try lockstat as suggested to see if something pathological is going on.  In
addition to showing lock contention problems it can often highlight a code
path being hit too frequently for some reason.

Have a look at the event counters and see if anything obviously ugly is
going on.

Also look for evidence of context switch storms. lockstat will show those
for lock objects, but not for condition variables or homegrown stuff
based off sleep queues.

What sort of TLB shootdown rate does systat vmstat show?  We have changes
forthcoming that should help with this during  Don't get me wrong,
the situation isn't particularly bad now on x86 for shootdowns but the
forthcoming changes improve it quite a bit.

I have a suspicion that SSD could cause issues for us because the buffer
cache and other parts of the I/O system are not designed with near
instantaneous request->response in mind, but that likely isn't at play
here.. Do you have logging turned on for the SSD?

We have some algorhythms in the scheduler and mutual exclusion code that
aren't designed for large numbers of cores, but I think they should be OK
with 24 CPUs.  (While I'm rambling about this I think the SPINLOCK_BACKOFF
stuff should have some sort of randomness to it perhaps based off curlwp and
cpu_counter() otherwise things could proceeed in lockstep, although again
probably not the issue here).

Home | Main Index | Thread Index | Old Index