Subject: Native pthreads issue with MySQL replication.
To: NetBSD-current Discussion List <current-users@NetBSD.ORG>
From: Andrew Gillham <gillham@vaultron.com>
List: current-users
Date: 10/27/2003 23:28:38
I'm using the pkgsrc mysql-server with native pthreads on my -current 
system.
(MySQL 3.23.58 on i386 1.6ZE SMP box)

I have mysql setup to as a slave replication server, and it is running fine,
except for shutting down or restarting.  It seems to shutdown ok when not
doing replication.

Apparently the replication thread is not shutting down correctly and the
mysqld process hangs, and I have to use 'kill -9' to cleanup.

The box is idle, so it is not a load issue and I can easily replicate it.

The mysql 'show processlist' command looks like this:
+--+------+---------+--+-------+----+---------------------+----------------+
|Id|User  |Host     |db|Command|Time|State                |Info            |
+--+------+---------+--+-------+----+---------------------+----------------+
|1 |system|none     |  |Connect|2   |Reading master update|                |
|2 |root  |localhost|  |Query  |0   |                     |show processlist|
+--+------+---------+--+-------+----+---------------------+----------------+

The normally running process:
  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
11798 mysql     18    0    10M 2716K sigwai/0   0:00  0.00%  0.00% mysqld

USER    PID %CPU %MEM   VSZ  RSS TT STAT STARTED    TIME COMMAND
mysql 11798  0.0  0.5 10012 2716 p0 Sa   10:44PM 0:00.07 /usr/pkg/.../mysqld

When I run 'mysqladmin shutdown' it doesn't change the state of the process,
but it doesn't exit.

Running mysqld under gdb didn't help, I ended up panic'ing the box after
doing 'kill -ABRT' on the mysql process, and then 'quit' in gdb.
The panic:
login: uvm_fault(0xe34a53c0, 0, 0, 1) -> 0xe
kernel: page fault trap, code=0
Stopped in pid 1772.1 (gdb) at  netbsd:kpsignal2+0x11b: testl   
%eax,0(%ebx,%edx
,4)
db{0}> tr
kpsignal2(e3e879cc,e3f03e64,1,e3e87d0c,0) at netbsd:kpsignal2+0x11b
psignal1(e3e879cc,1,1,e3e879cc,0) at netbsd:psignal1+0x29
orphanpg(e34df140,0,e3f03eec,c035b593,c076afa8) at netbsd:orphanpg+0x33
fixjobc(e3e87d0c,e34df100,0,e34a2fd8,0) at netbsd:fixjobc+0x76
exit1(e34a8e58,0,0,0,e3f03f5c) at netbsd:exit1+0x13f
sys_exit(e34a8e58,e3f03f64,e3f03f5c,1,7) at netbsd:sys_exit+0x23
syscall_plain(e3f03fa8,1f,1f,821001f,bfbf001f) at netbsd:syscall_plain+0x173
db{0}>

Anyway, I have a slightly older version of MySQL built statically with the
mit-pthreads including with MySQL and it works correctly, so I would
guess it is related to the native pthreads.  Perhaps signal related?

If anyone is using MySQL with replication with native pthreads, please
let me know.  Or if you have any ideas on how to debug this a bit more.

-Andrew