Subject: Re: port-amd64/31359 (SMP amd64 system exhibits excessive cpu shootdown IPIs)
To: None <chs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-bugs
Date: 11/20/2007 15:20:02
The following reply was made to PR port-amd64/31359; it has been noted by GNATS.

From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/31359 (SMP amd64 system exhibits excessive cpu shootdown IPIs) 
Date: Tue, 20 Nov 2007 09:16:38 -0600

 ad@netbsd.org writes:
 > Synopsis: SMP amd64 system exhibits excessive cpu shootdown IPIs
 > 
 > State-Changed-From-To: analyzed->feedback
 > State-Changed-By: ad@netbsd.org
 > State-Changed-When: Thu, 15 Nov 2007 21:10:02 +0000
 > State-Changed-Why:
 > It's actually a more general problem in that the amd64 pmap was
 > issuing far too many unnecessary shootdowns. That has been fixed,
 > and the shootdown code re-written to reduce overhead and improve
 > concurrency. 
 > 
 > I still notice "more" shootdowns happening than on i386 with a
 > standard kernel, but I believe that the problem should be (by
 > and large) fixed.
 
 It seems that the "cpux TLB Shootdown IPI" field has been removed 
 from 'vmstat -i', so I can't really directly tell what the effect is.. 
 
 On the same system, with a -j 16 build of the kernel, I see:
 
 global TLB IPI                              7903333       97
 global TLB IPI                              7904305       97
 global TLB IPI                              7904995       97
 global TLB IPI                              7906251       97
 global TLB IPI                              7907301       97
 global TLB IPI                              7908752       97
 global TLB IPI                              7909663       97
 
 from "vmstat -i 1".  In general the number of interrupts seems to 
 have dropped significantly on that box, since the total interrupts 
 per second is now around 2K, which is definitely down from 3K/cpu/sec
 that was seen before!!!!
 
 On a different box, with a -j 32 build.sh, I see this:
 
 global TLB IPI                             15388140      236
 global TLB IPI                             15395469      236
 global TLB IPI                             15403732      236
 global TLB IPI                             15409438      236
 global TLB IPI                             15414926      236
 
 from "vmstat -i 1".  And the total interrupts on the system appears 
 to be at about 10-12K/sec.  (building to a tmpfs, so local disk not 
 in use..)  This latter system has 8 cores as opposed to the 4 on the 
 previous one.  I hate to think what the IPIs would have been with the 
 old code on this system :) 
 
 I think this problem is now fixed -- Thanks! 
 
 Later...
 
 Greg Oster