Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Performance problems.



Hi. Thanks for the input...

Matt Thomas wrote:
On May 5, 2010, at 9:09 AM, Johnny Billquist wrote:

Tobias Nygren wrote:
On Wed, 05 May 2010 12:11:59 +0200
Johnny Billquist <bqt%softjar.se@localhost> wrote:
Anyone else seen this, knows what could be affecting it, or care?
Or are all others just using modern fast machines, where all this don't really 
make a blip on the radar anyway, or else is not using their machines enough to 
make this visible?
My gut feeling is that even on comparatively fast sparc64 hardware more
cycles are spent these days doing kernel work. But it may simply be
that the accounting in top(1) is broken / calculated differently from
how it used to be done. Benchmark results of current vs. 2.0 on low
powered uniprocessor systems would be very interesting to see.
I'm trying to look at this now, but what I'm looking at right now don't seem to 
have any bearing on sparcs, sorry.

That said, maybe someone can help me with a few details here. I'm looking at 
the softint code. Currently, there exists a __HAVE_FAST_SOFTINT option for the 
kernel. The only two architectures that seem to have defined this is x86 and 
VAX.
So far, so good.

But looking further into kern/kern_softint.c, I see a number of calls to 
softint_init_isr, where the second parameter is said to be the interrupt 
priority. The comments and descriptions about soft interrupt priorities say:

*       The four priority levels map directly to scheduler priority
*       levels, and where the architecture implements 'fast' software
*       interrupts, they also map onto interrupt priorities.  The
*       interrupt priorities are intended to be hidden from machine
*       independent code, which should use thread-safe mechanisms to
*       synchronize with software interrupts (for example: mutexes).

IPL_SOFTCLOCK is what that is referring to and those are defined correctly.

So the sentence "they also map onto interrupt priorities" should not be taken that they are the same as interrupt priorities?

Am I reading and understanding things wrong here, or is this fundamentally 
wrong in design? If the priority levels should match the interrupt levels in 
the case of fast soft interrupts, I can't understand why the definitions are in 
the MI files. It should be pretty machine dependent...

The former.

:-)

Since only the VAX and x86 are using fast soft interrupts, I can see that the 
code might have been written so that it works on the x86. Since noone else uses 
it, noone else gets hit by this but the VAX.

Actually, MIPS is using them now too.

Ah. I was searching through the arch/_/include directories for the definition, and didn't find it for MIPS. Oh well, not that important anyway, since I think I was barking up the wrong tree.

By the way, I did try changing the PRI_SOFTCLOCK to IPL_SOFTCLOCK in kern_softint.c. The system worked just as fine anyway, but it didn't make my observed problems go away either, so that don't seem to have been the problem. :-(

If anyone can shed some more light on this, please do. In the meanwhile, I'll 
just do some blind testing and see if changing this will help...

I still have the problem, and am still interested in finding out what it is. This is a serious issue, and one that is currently hurting performance seriously on the 8650, and I would suspect it also hurts other machines, even if it perhaps is less visible. But there is definitely a problem in there somewhere.

My observation points so far:

1) The clock of the system goes hayward after a while. So much so that ntp is not able to ever sync to a time source again. 2) After running a couple of cvs update in parallel, along with a build.sh, doing a shutdown -r takes me close to 10 minutes to get the machine to reboot. I don't know what it is doing, but it's definitely slower than molass.

These two are very visible, tangible and easily repeatable problems.

My third one is general performance. The system seems to be spending excessive times in system mode. Yes, doing something CPU intensive will cause the system to run mostly in user mode, but doing anything that requires for instance, much disk access, directly pushes system time up over 50%, often hitting around 80% system time, which seems like way too much.

This is a bit harder to define. After all, how much time is to be expected? And how long should various commands take? I plan to install NetBSD 2, to make relative comparisions between, which will help. But for now, it's most just a feeling.

As usual. Any hints, ideas, or help would be appreciated.

        Johnny


Home | Main Index | Thread Index | Old Index