tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: netbsd-4 vs -current web server performance



On Wed, Sep 3, 2008 at 3:54 PM, Jeff Rizzo <riz%netbsd.org@localhost> wrote:
> matthew sporleder wrote:
>> On Wed, Sep 3, 2008 at 3:08 PM, Jeff Rizzo <riz%netbsd.org@localhost> wrote:
>>
>>> Thor Lancelot Simon wrote:
>>>
>>>> On Tue, Sep 02, 2008 at 02:52:33PM -0700, Jeff Rizzo wrote:
>>>>
>>>>>   PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
>>>>>  4742 www       42    0    17M  314M parked/2 233:14 99.02% 99.02% httpd
>>>>>  3265 www       39    0    17M  266M parked/0 164:34 99.02% 99.02% httpd
>>>>>  8844 www       40    0    17M  298M parked/0 158:27 99.02% 99.02% httpd
>>>>>
>>>>>
>>>> Parked *and* 100% of a CPU?
>>>>
>>>> That seems just wrong.
>>>>
>>>> Thor
>>>>
>>>>
>>> I would tend to agree - but I don't know all that much about the new
>>> states.  This issue is fairly reproducible (within a few hours) - and
>>> also seems to go away (for a few hours) after restarting apache.  I have
>>> a couple weeks to get this fixed, since even at it's worst it's not too
>>> bad, and in any event I have multiple webservers.
>>>
>>>
>>
>>
>> Do the httpd's slowly increase until they get parked at 99% or do they
>> suddenly spike?
>>
>>
>
> Almost impossible to tell - we have enough web traffic to keep a couple
> servers busy at our slow times, so the line between "noticably, but
> normally busy" and "wedged" is hard to demarcate - until they get into
> the 'parked' state.
>
>> If it's a slow increase, could you start monitoring various stats like
>> the number of threads/PID, memory growth, etc?
>>
>
> It hadn't occurred to me to look at specific threads - I am currently
> seeing one thread on one machine that's using about 90% CPU, and it's
> always on one of the CPUs.  That *process* is showing in 'parked' and is
> using 90% of a CPU as well.  (I've reset everything in the last couple
> hours, so this looks like the "bad state" starting to happen)  I don't
> see any single thread on the other -current webserver getting more than
> 12-13% of a CPU for more than one 5-sec interval.
>
> Interestingly, memory usage on the -current boxes seems lower than the
> -4 box, if top is to be believed:
>
> netbsd-4:
> Memory: 7770M Act, 35M Inact, 9188K Wired, 19M Exec, 7062M File, 6924M Free
> Swap: 1024M Total, 1024M Free
>
> current (1):
> Memory: 1286M Act, 502M Inact, 12M Wired, 14M Exec, 106M File, 14G Free
> Swap: 4096M Total, 4096M Free
>
> current(2):
> Memory: 1289M Act, 747M Inact, 11M Wired, 14M Exec, 750M File, 13G Free
> Swap: 4096M Total, 4096M Free
>
>> If it's a sudden spike, is there a way you can run ktrace or something
>> until you can start to isolate the trigger?
>>
>
> Kinda hard.  Real busy.  And it's definitely _not_ sudden - looks like
> problem threads just accumulate until the whole server slows down.
>


I guess put something like:
 ps axswwu >> ps.out into cron to run ever minute or so and then you
can get a better idea of growth patterns for the specific threads.

Also add %{tid}P or %{hextid}P to apache's custom_log (and definitely
put in %T) and see if you can match-up a single event to a hung
thread.  (the last request of a given thread may get stuck and never
return -- you might be able to find a sequence).

It sounds like a code path in your website is causing occasional hung
threads and those either spin until the process goes parked or
accumulate as apache spawns new workers until the process goes parked.
 (the ps monitor will show this)

Can you take this box out and reproduce this with jmeter or ab? (or
whatever tool you like)


Home | Main Index | Thread Index | Old Index