Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Dramatic mysqld slowness when running compiles in parallel



Hi,

Sorry to keep harping on this but the problem still exists in today's current.  
In addition to the tests from a single processor Thinkpad T42

(see http://mail-index.netbsd.org/current-users/2008/07/17/msg003575.html)

I have now reproduced the problem on a two processor Dell PowerEdge 2650.  On 
the latter, with both processors, the issue much less severe but still very 
noticeable if I increase the load.  When I disable one processor the issue 
shows up immediately.

With the NetBSD 5 branch approaching, it seems to me this issue would be worth 
tracking down.

To reproduce the problem, install mysq5-server, create a simple test table as 
described in my initial report, place ~200 (identical) insert statements into 
the file 'sql-file' and run

        time mysql test < sql-file

Repeat the test a few times while compiling, e.g., /usr/pkgsrc/shells/bash and 
notice the slowdown by factors of one hundred and more.

Some additional observations since my last set of tests:

(1) I used time(1) to time mysqld while running the tests.  The only 
difference is that the real time is much larger in the "slow" case.  User and 
sys time is the same for both slow and fast execution.  This is supported by 
profiling results for mysqld where I see no difference between the slow and 
the fast case.

(2) Here is another "ktruss -T -tA mysqld" segment from a particularly slow 
insert (I have sorted the output by the time column):
 15638     11 mysqld   1223051925.586724189 _lwp_kill(0xa, 0xe) called
 15638     11 mysqld   1223051925.586729218 _lwp_kill(0xa, 0xe) = 0
 15638     10 mysqld   1223051925.586734805 resume kernel
 15638     10 mysqld   1223051925.586737320 __sigtimedwait(0xb6fffd88, 
0xb6fffcc4, 0) = 0
 15638     10 mysqld   1223051925.586741231 __sigprocmask14(0x3, 0x8570edc, 
0xb6fffd30) called
 15638     10 mysqld   1223051925.586742907 __sigprocmask14(0x3, 0x8570edc, 
0xb6fffd30) = 0
 15638     10 mysqld   1223051925.586745701 _lwp_park(0, 0, 0x8570eb8, 
0x8570eb8) called
 15638     10 mysqld   1223051925.586748494 stop kernel
(*** notice the large time gap here)
 15638      7 mysqld   1223051926.126986804 resume kernel
 15638      7 mysqld   1223051926.126990157 select(0, 0, 0, 0, 0xb87ff864) = 0
 15638      7 mysqld   1223051926.126999935 gettimeofday(0xb87ff860, 0) called
 15638      7 mysqld   1223051926.127002449 gettimeofday(0xb87ff860, 0) = 0
 15638      7 mysqld   1223051926.127244943 select(0, 0, 0, 0, 0xb87ff864) 
called
 15638      7 mysqld   1223051926.127250251 stop kernel
 15638      8 mysqld   1223051926.127255839 resume kernel
 15638      8 mysqld   1223051926.127257794 select(0, 0, 0, 0, 0xb85fff64) = 0
 15638      8 mysqld   1223051926.127262823 gettimeofday(0xb85fff60, 0) called
 15638      8 mysqld   1223051926.127265617 gettimeofday(0xb85fff60, 0) = 0
 15638      8 mysqld   1223051926.127813184 select(0, 0, 0, 0, 0xb85fff64) 
called
 15638      8 mysqld   1223051926.127815139 stop kernel
(*** and here)
 15638     11 mysqld   1223051926.547139687 _lwp_unpark(0xa, 0x8570eb8) called
 15638     11 mysqld   1223051926.547146392 _lwp_unpark(0xa, 0x8570eb8) = 0

The same part from a fast insert looks like:
  4792     11 mysqld   1223050783.748332054 _lwp_kill(0xa, 0xe) called
  4792     11 mysqld   1223050783.748335127 _lwp_kill(0xa, 0xe) = 0
  4792     10 mysqld   1223050783.748340714 resume kernel
  4792     10 mysqld   1223050783.748343229 __sigtimedwait(0xb6bffd88, 
0xb6bffcc4, 0) = 0
  4792     10 mysqld   1223050783.748347420 __sigprocmask14(0x3, 0x8570edc, 
0xb6bffd30) called
  4792     10 mysqld   1223050783.748349096 __sigprocmask14(0x3, 0x8570edc, 
0xb6bffd30) = 0
  4792     10 mysqld   1223050783.748351889 _lwp_park(0, 0, 0x8570eb8, 
0x8570eb8) called
  4792     10 mysqld   1223050783.748354124 stop kernel
(*** this is where the two time gaps occur in the slow case -- no gap here!)
  4792     11 mysqld   1223050783.748895545 _lwp_unpark(0xa, 0x8570eb8) called
  4792     11 mysqld   1223050783.748898060 _lwp_unpark(0xa, 0x8570eb8) = 0

My plan is to look at revivesa to see if it behaves similarly (but if somebody 
beats me to it I would not mind).

I'd appreciate any suggestions, comments, and other help with this.

Thanks,
Sverre

PS  The _lwp_unpark calls appear to come from the libpthread functions 
pthread__mutex_unlock_slow or pthread__mutex_wakeup.  I tried to trace the 
call stack by recompiling libpthread with -g, attaching gdb to the mysqld 
process, and setting breakpoints at the two functions.  When I did that, 
however, gdb would lock up after a couple of continues and I was not able to 
get meaningful results.


Home | Main Index | Thread Index | Old Index