Subject: Re: mutex fault
To: Andrew Doran <ad@netbsd.org>
From: Kazushi (Jam) Marukawa <jam@pobox.com>
List: port-xen
Date: 11/28/2007 19:00:50
   On Nov 26, 15:06, Andrew Doran wrote:
   > Subject: Re: mutex fault
   > On Mon, Nov 26, 2007 at 08:24:45PM +0900, Kazushi Marukawa wrote:
   > 
   > The last call can be turned into "jmp mutex_exit" by the compiler. Can you
   > try compiling the kernel with -O0? It will not make those optimizations.

Thanks.  I got new crash log from -O0 kernel and it was
different.  There are two functions between
callout_softclock and mutex_vector_exit.  lwp_unlock and
sleepq_timeout functions.

The sleepq_timeout is:

    lwp_lock(l);
    if (l->l_wchan == NULL) {
        lwp_unlock(l);
        return;
    }

Calling lwp_lock(l) and low_unlock(l) in above sequence was
causing my problem.  hehe.  I guess some rare race condition
is causing this, but not sure where.  Is it possible to find
the source of problem from the trace information?  I added
the trace log at the bottom of this mail.

On the other hand, I can put some codes to trap the
condition if a patch to modify source to trap it is
available.  Please let me konw for the case.

Thanks.


Trace log is here:
----
Mutex error: mutex_vector_exit: exiting unheld spin mutex

lock address : 0x00000000c0b4ed80
current cpu  :                  0
current lwp  : 0x00000000ca965e00
owner field  : 0x0000000000000b00 wait/spin:                0/1

panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c05ce919 cs 9 eflags 286 cr2 0 ilevel b
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
db> trace
breakpoint(c08d6542,cb192e34,0,0,cb192e34) at netbsd:breakpoint+0x1
panic(c08d584b,c08d1868,c08d19d1,c08d19e3,c0b4ed80) at netbsd:panic+0xcc
lockdebug_abort(c0b4ed80,c0aa52ec,c08d19d1,c08d19e3,c0b4df06) at netbsd:lockdebug_abort+0x6a
mutex_abort(c0b4ed80,c08d19d1,c08d19e3,0,c0cc4000) at netbsd:mutex_abort+0x42
mutex_vector_exit(c0b4ed80,b360,cb192ed8,c04fd6f2,cd5788c0) at netbsd:mutex_vector_exit+0x43
lwp_unlock(cd5788c0,ca003b9a,989680,c0aeb360,c0aeb360) at 0xc04fd72b
sleepq_timeout(cd5788c0,0,64,c0617c80,1) at netbsd:sleepq_timeout+0x2f
callout_softclock(0,0,cb192fa8,c0b52d00,c180dcc0) at netbsd:callout_softclock+0x537
softintr_dispatch(0,cb192f78,f,0,c04e35d4) at netbsd:softintr_dispatch+0x9a
DDB lost frame for netbsd:Xsoftclock+0x3c, trying 0xcb192f70
Xsoftclock() at netbsd:Xsoftclock+0x3c
--- interrupt ---
--- switch to interrupt stack ---
?(bb784000,0,0,0,0) at 0
db> reboot
syncing disks... Mutex error: mutex_vector_enter: locking against myself

lock address : 0x00000000c0b4df04
current cpu  :                  0
current lwp  : 0x00000000ca965e00
owner field  : 0x0000000000010b00 wait/spin:                0/1

panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c05ce919 cs 9 eflags 296 cr2 0 ilevel c
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
db> trace
breakpoint(c08d6542,cb1929fc,0,0,cb1929fc) at netbsd:breakpoint+0x1
panic(c08d584b,c08d1868,c08d1985,c08d1998,c0b4df04) at netbsd:panic+0xcc
lockdebug_abort(c0b4df04,c0aa52ec,c08d1985,c08d1998,b) at netbsd:lockdebug_abort+0x6a
mutex_abort(c0b4df04,c08d1985,c08d1998,c04ef12d,cd561026) at netbsd:mutex_abort+0x42
mutex_vector_enter(c0b4df04,c,cb192ab0,c0501184,cd379c40) at netbsd:mutex_vector_enter+0x86
lwp_lock(cd379c40,0,cb192ac0,c01f9d29,cd379c40) at 0xc0500251
suspendsched(c08db64a,0,cb192ae0,c01f9dc1,cb192ad8) at netbsd:suspendsched+0x77
vfs_shutdown(cb192b1c,1,cb192b20,c01f8e83,0) at netbsd:vfs_shutdown+0x2c
cpu_reboot(0,0,c01f8463,c0b3c4e0,c0881560) at netbsd:cpu_reboot+0x3f
db_sifting_cmd(1,0,0,cb192b58,8) at netbsd:db_sifting_cmd
db_command(c0a9c5c0,bfeffc04,cb192c20,9,cb192bf4) at netbsd:db_command+0x39c
db_command_loop(c05ce919,0,2,c0aa55ed,cb192c0c) at netbsd:db_command_loop+0x96
db_trap(1,0,58,0,2) at netbsd:db_trap+0x1a2
kdb_trap(1,0,cb192dac,c05ce919,9) at netbsd:kdb_trap+0x161
trap() at netbsd:trap+0x423
--- trap (number 1) ---
breakpoint(c08d6542,cb192e34,0,0,cb192e34) at netbsd:breakpoint+0x1
panic(c08d584b,c08d1868,c08d19d1,c08d19e3,c0b4ed80) at netbsd:panic+0xcc
lockdebug_abort(c0b4ed80,c0aa52ec,c08d19d1,c08d19e3,c0b4df06) at netbsd:lockdebug_abort+0x6a
mutex_abort(c0b4ed80,c08d19d1,c08d19e3,0,c0cc4000) at netbsd:mutex_abort+0x42
mutex_vector_exit(c0b4ed80,b360,cb192ed8,c04fd6f2,cd5788c0) at netbsd:mutex_vector_exit+0x43
lwp_unlock(cd5788c0,ca003b9a,989680,c0aeb360,c0aeb360) at 0xc04fd72b
sleepq_timeout(cd5788c0,0,64,c0617c80,1) at netbsd:sleepq_timeout+0x2f
callout_softclock(0,0,cb192fa8,c0b52d00,c180dcc0) at netbsd:callout_softclock+0x537
softintr_dispatch(0,cb192f78,f,0,c04e35d4) at netbsd:softintr_dispatch+0x9a
DDB lost frame for netbsd:Xsoftclock+0x3c, trying 0xcb192f70
Xsoftclock() at netbsd:Xsoftclock+0x3c
--- interrupt ---
--- switch to interrupt stack ---
?(bb784000,0,0,0,0) at 0
db> reboot
Mutex error: mutex_vector_enter: locking against myself

lock address : 0x00000000c0b4df04
current cpu  :                  0
current lwp  : 0x00000000ca965e00
owner field  : 0x0000000000010b00 wait/spin:                0/1

panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c05ce919 cs 9 eflags 282 cr2 0 ilevel b
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
db> trace
breakpoint(c08d6542,cb191fa8,0,0,cb191fa8) at netbsd:breakpoint+0x1
panic(c08d584b,c08d1868,c08d1985,c08d1998,c0b4df04) at netbsd:panic+0xcc
lockdebug_abort(c0b4df04,c0aa52ec,c08d1985,c08d1998,b) at netbsd:lockdebug_abort+0x6a
mutex_abort(c0b4df04,c08d1985,c08d1998,c04ef12d,c0b4edc6) at netbsd:mutex_abort+0x42
mutex_vector_enter(c0b4df04,2,cb19205c,c0528892,cd22e1c0) at netbsd:mutex_vector_enter+0x86
lwp_lock(cd22e1c0,0,cb19209c,c055032b,2) at 0xc05288e9
selwakeup(c18b494c,0,cb19209c,c054fa93,c18b494c) at netbsd:selwakeup+0x69
selnotify(c18b494c,0,c19af600,c19af600,42) at netbsd:selnotify+0x11
sowakeup(c18b48c8,c18b494c,1,c19af600,1) at netbsd:sowakeup+0x19
udp4_sendup(c18a3e00,1c,cb192188,c18b48c8,c18a3e00) at netbsd:udp4_sendup+0x171
udp4_realinput(cb192188,cb192178,cb1921c4,1c,ca9623c0) at netbsd:udp4_realinput+0x15a
udp_input(c18a3e00,14,11,1,c0b4da44) at netbsd:udp_input+0x223
ip_input(c18a3e00,800,c18a8d00,caf7d030,2) at netbsd:ip_input+0xae7
ipintr(cb192278,cb190011,c05e0031,c01f0011,11) at netbsd:ipintr+0x6d
Xsoftnet() at netbsd:Xsoftnet+0x57
--- interrupt ---
0x9:
db> reboot
Mutex error: mutex_vector_enter: locking against myself

lock address : 0x00000000cd561024
current cpu  :                  0
current lwp  : 0x00000000ca965e00
owner field  : 0x0000000000010b00 wait/spin:                0/1

panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c05ce919 cs 9 eflags 292 cr2 0 ilevel b
Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
db> trace
breakpoint(c08d6542,cb191970,0,0,cb191970) at netbsd:breakpoint+0x1
panic(c08d584b,c08d1868,c08d1985,c08d1998,cd561024) at netbsd:panic+0xcc
lockdebug_abort(cd561024,c0aa52ec,c08d1985,c08d1998,b) at netbsd:lockdebug_abort+0x6a
mutex_abort(cd561024,c08d1985,c08d1998,c04ef349,c0b4e536) at netbsd:mutex_abort+0x42
mutex_vector_enter(cd561024,2000000,cb191a24,c05e145c,1000000) at netbsd:mutex_vector_enter+0x86
sched_pstats(0,c19c8418,c18a4200,c1a42000,762cf000) at netbsd:sched_pstats+0x39
callout_softclock(0,c19b2d00,800,c0b52d00,c180dcc0) at netbsd:callout_softclock+0x52a
softintr_dispatch(0,cb191b14,cb190011,c05e0031,c01f0011) at netbsd:softintr_dispatch+0x9a
Xsoftclock() at netbsd:Xsoftclock+0x3c
--- interrupt ---
0x9:
db> reboot
rebooting...

-- Kazushi
Screw up your courage!  You've screwed up everything else.