tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Wild CPU usage times on NetBSD 5



    Date:        Thu, 26 Nov 2009 06:27:07 +0000
    From:        David Holland <dholland-tech%netbsd.org@localhost>
    Message-ID:  <20091126062707.GA841%netbsd.org@localhost>

  | (But I'd expect it to be burning cpu in the kernel, not in userspace.
  | Maybe it's getting into some kind of signal loop? Any chance of
  | catching it under ktrace?)

Yes, sure, when this happens (at least with the spamassassin one) the
processes easily last long enough (and I get enough spam mail that needs
to be filtered that it just runs over and over again, sometimes the total
elapsed time for processing a batch of an hour's incoming mail is 30-40
minutes) - picking a process and ktracing it during that time should be
easy.

Magicpoint less easy, as when that hapens I usually have no time to think
about the system much, there are usually a bunch of students trying to
avoid falling asleep while listening to my droning on..   I guess it
also happens though sometimes when I'm just previewing, there I notice less
as it isn't so important that the slide shows up immediately (when it has
a delay it's only 5-10 secs - but that's enough for me to have wondered if
I did actually ask for the next slide, and so ask again, and again - then
have to go back, one, or two, or three times - each of which is taking 5-10
secs... and the whole thing becomes a mess until I eventually just
stop, and let it catch up so I can find out where I ended up leaving the
slide set positioned after all that typeahead...

The shell one I could ktrace easily as well, next time it happens - provided
I notice it, that one takes so long at the best of times that I'm never
actually watching waiting for it to finish, so whether it takes its
usual 5 mins or so, or an hour, I usually don't even see the difference
(just sometime later I see that it did in fact finish).

You're certainly right that I am not getting any kind of deadlock, in fact
not any kind of actual error - everything works, and works properly, just
slowly...   But your "already fixed" would not necessarily apply, I was
running 5.0.1 (and before that, briefly, 5.0, and 4.99.various earlier
than that) - I am now running 5_STABLE (up to date, yesterday) but since then
I have not seen the problem (which so far, means nothing though.)

If it were just perl & magicpoint, I'd suspect perhaps the floating point
unit had stopped, and all fp ops were trapping into user space to be
processed, but I can't see the shell doing enough fp to be slowed down
(not only that sh isn't a heavy fp process in any case, but that the
workload here is all string processing - that is, unless the i386 (well, 686)
uses the fp unit to perform string operations, and what might normally be one
instruction is trapping).

Signals are possible - and I would not be surprised at either perl or
magicpoint trying to process some, but I doubt the shell is doing anything
weird with signals, almost any signal (but SIGCHLD) should just cause it to
exit if delivered to user space (the process runs in bg usually, so it should
be just ignoring SIGINT).  The shell wouldn't be running many processes,
and perl (spamassassin) and magicpoint should be running none, so there
shouldn't be a SIGCHLD problem (well, if everything is working.)

Next time I observe it I'll go hunting.

kre



Home | Main Index | Thread Index | Old Index