Subject: Re: mutex fault
To: Kazushi Marukawa <jam@pobox.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 12/07/2007 12:11:23
[ moved to tech-kern to get wider audience ]

Hi,
does anyone with some clues about locking have some comments about this ?
I've also seen this on a xenamd64 system. This system has raidframe
and LFS (the LFS is not on a RAID partition) if that matters.

Kazushi, do you use raidframe or LFS on your system ?


On Wed, Nov 28, 2007 at 07:00:50PM +0900, Kazushi Marukawa wrote:
> Thanks.  I got new crash log from -O0 kernel and it was
> different.  There are two functions between
> callout_softclock and mutex_vector_exit.  lwp_unlock and
> sleepq_timeout functions.
> 
> The sleepq_timeout is:
> 
>     lwp_lock(l);
>     if (l->l_wchan == NULL) {
>         lwp_unlock(l);
>         return;
>     }
> 
> Calling lwp_lock(l) and low_unlock(l) in above sequence was
> causing my problem.  hehe.  I guess some rare race condition
> is causing this, but not sure where.  Is it possible to find
> the source of problem from the trace information?  I added
> the trace log at the bottom of this mail.
> 
> On the other hand, I can put some codes to trap the
> condition if a patch to modify source to trap it is
> available.  Please let me konw for the case.
> 
> Thanks.
> 
> 
> Trace log is here:
> ----
> Mutex error: mutex_vector_exit: exiting unheld spin mutex
> 
> lock address : 0x00000000c0b4ed80
> current cpu  :                  0
> current lwp  : 0x00000000ca965e00
> owner field  : 0x0000000000000b00 wait/spin:                0/1
> 
> panic: lock error
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 eip c05ce919 cs 9 eflags 286 cr2 0 ilevel b
> Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
> db> trace
> breakpoint(c08d6542,cb192e34,0,0,cb192e34) at netbsd:breakpoint+0x1
> panic(c08d584b,c08d1868,c08d19d1,c08d19e3,c0b4ed80) at netbsd:panic+0xcc
> lockdebug_abort(c0b4ed80,c0aa52ec,c08d19d1,c08d19e3,c0b4df06) at netbsd:lockdebug_abort+0x6a
> mutex_abort(c0b4ed80,c08d19d1,c08d19e3,0,c0cc4000) at netbsd:mutex_abort+0x42
> mutex_vector_exit(c0b4ed80,b360,cb192ed8,c04fd6f2,cd5788c0) at netbsd:mutex_vector_exit+0x43
> lwp_unlock(cd5788c0,ca003b9a,989680,c0aeb360,c0aeb360) at 0xc04fd72b
> sleepq_timeout(cd5788c0,0,64,c0617c80,1) at netbsd:sleepq_timeout+0x2f
> callout_softclock(0,0,cb192fa8,c0b52d00,c180dcc0) at netbsd:callout_softclock+0x537
> softintr_dispatch(0,cb192f78,f,0,c04e35d4) at netbsd:softintr_dispatch+0x9a
> DDB lost frame for netbsd:Xsoftclock+0x3c, trying 0xcb192f70
> Xsoftclock() at netbsd:Xsoftclock+0x3c
> --- interrupt ---
> --- switch to interrupt stack ---
> ?(bb784000,0,0,0,0) at 0
> db> reboot
> syncing disks... Mutex error: mutex_vector_enter: locking against myself
> 
> lock address : 0x00000000c0b4df04
> current cpu  :                  0
> current lwp  : 0x00000000ca965e00
> owner field  : 0x0000000000010b00 wait/spin:                0/1
> 
> panic: lock error
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 eip c05ce919 cs 9 eflags 296 cr2 0 ilevel c
> Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
> db> trace
> breakpoint(c08d6542,cb1929fc,0,0,cb1929fc) at netbsd:breakpoint+0x1
> panic(c08d584b,c08d1868,c08d1985,c08d1998,c0b4df04) at netbsd:panic+0xcc
> lockdebug_abort(c0b4df04,c0aa52ec,c08d1985,c08d1998,b) at netbsd:lockdebug_abort+0x6a
> mutex_abort(c0b4df04,c08d1985,c08d1998,c04ef12d,cd561026) at netbsd:mutex_abort+0x42
> mutex_vector_enter(c0b4df04,c,cb192ab0,c0501184,cd379c40) at netbsd:mutex_vector_enter+0x86
> lwp_lock(cd379c40,0,cb192ac0,c01f9d29,cd379c40) at 0xc0500251
> suspendsched(c08db64a,0,cb192ae0,c01f9dc1,cb192ad8) at netbsd:suspendsched+0x77
> vfs_shutdown(cb192b1c,1,cb192b20,c01f8e83,0) at netbsd:vfs_shutdown+0x2c
> cpu_reboot(0,0,c01f8463,c0b3c4e0,c0881560) at netbsd:cpu_reboot+0x3f
> db_sifting_cmd(1,0,0,cb192b58,8) at netbsd:db_sifting_cmd
> db_command(c0a9c5c0,bfeffc04,cb192c20,9,cb192bf4) at netbsd:db_command+0x39c
> db_command_loop(c05ce919,0,2,c0aa55ed,cb192c0c) at netbsd:db_command_loop+0x96
> db_trap(1,0,58,0,2) at netbsd:db_trap+0x1a2
> kdb_trap(1,0,cb192dac,c05ce919,9) at netbsd:kdb_trap+0x161
> trap() at netbsd:trap+0x423
> --- trap (number 1) ---
> breakpoint(c08d6542,cb192e34,0,0,cb192e34) at netbsd:breakpoint+0x1
> panic(c08d584b,c08d1868,c08d19d1,c08d19e3,c0b4ed80) at netbsd:panic+0xcc
> lockdebug_abort(c0b4ed80,c0aa52ec,c08d19d1,c08d19e3,c0b4df06) at netbsd:lockdebug_abort+0x6a
> mutex_abort(c0b4ed80,c08d19d1,c08d19e3,0,c0cc4000) at netbsd:mutex_abort+0x42
> mutex_vector_exit(c0b4ed80,b360,cb192ed8,c04fd6f2,cd5788c0) at netbsd:mutex_vector_exit+0x43
> lwp_unlock(cd5788c0,ca003b9a,989680,c0aeb360,c0aeb360) at 0xc04fd72b
> sleepq_timeout(cd5788c0,0,64,c0617c80,1) at netbsd:sleepq_timeout+0x2f
> callout_softclock(0,0,cb192fa8,c0b52d00,c180dcc0) at netbsd:callout_softclock+0x537
> softintr_dispatch(0,cb192f78,f,0,c04e35d4) at netbsd:softintr_dispatch+0x9a
> DDB lost frame for netbsd:Xsoftclock+0x3c, trying 0xcb192f70
> Xsoftclock() at netbsd:Xsoftclock+0x3c
> --- interrupt ---
> --- switch to interrupt stack ---
> ?(bb784000,0,0,0,0) at 0
> db> reboot
> Mutex error: mutex_vector_enter: locking against myself
> 
> lock address : 0x00000000c0b4df04
> current cpu  :                  0
> current lwp  : 0x00000000ca965e00
> owner field  : 0x0000000000010b00 wait/spin:                0/1
> 
> panic: lock error
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 eip c05ce919 cs 9 eflags 282 cr2 0 ilevel b
> Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
> db> trace
> breakpoint(c08d6542,cb191fa8,0,0,cb191fa8) at netbsd:breakpoint+0x1
> panic(c08d584b,c08d1868,c08d1985,c08d1998,c0b4df04) at netbsd:panic+0xcc
> lockdebug_abort(c0b4df04,c0aa52ec,c08d1985,c08d1998,b) at netbsd:lockdebug_abort+0x6a
> mutex_abort(c0b4df04,c08d1985,c08d1998,c04ef12d,c0b4edc6) at netbsd:mutex_abort+0x42
> mutex_vector_enter(c0b4df04,2,cb19205c,c0528892,cd22e1c0) at netbsd:mutex_vector_enter+0x86
> lwp_lock(cd22e1c0,0,cb19209c,c055032b,2) at 0xc05288e9
> selwakeup(c18b494c,0,cb19209c,c054fa93,c18b494c) at netbsd:selwakeup+0x69
> selnotify(c18b494c,0,c19af600,c19af600,42) at netbsd:selnotify+0x11
> sowakeup(c18b48c8,c18b494c,1,c19af600,1) at netbsd:sowakeup+0x19
> udp4_sendup(c18a3e00,1c,cb192188,c18b48c8,c18a3e00) at netbsd:udp4_sendup+0x171
> udp4_realinput(cb192188,cb192178,cb1921c4,1c,ca9623c0) at netbsd:udp4_realinput+0x15a
> udp_input(c18a3e00,14,11,1,c0b4da44) at netbsd:udp_input+0x223
> ip_input(c18a3e00,800,c18a8d00,caf7d030,2) at netbsd:ip_input+0xae7
> ipintr(cb192278,cb190011,c05e0031,c01f0011,11) at netbsd:ipintr+0x6d
> Xsoftnet() at netbsd:Xsoftnet+0x57
> --- interrupt ---
> 0x9:
> db> reboot
> Mutex error: mutex_vector_enter: locking against myself
> 
> lock address : 0x00000000cd561024
> current cpu  :                  0
> current lwp  : 0x00000000ca965e00
> owner field  : 0x0000000000010b00 wait/spin:                0/1
> 
> panic: lock error
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 eip c05ce919 cs 9 eflags 292 cr2 0 ilevel b
> Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x1:  ret
> db> trace
> breakpoint(c08d6542,cb191970,0,0,cb191970) at netbsd:breakpoint+0x1
> panic(c08d584b,c08d1868,c08d1985,c08d1998,cd561024) at netbsd:panic+0xcc
> lockdebug_abort(cd561024,c0aa52ec,c08d1985,c08d1998,b) at netbsd:lockdebug_abort+0x6a
> mutex_abort(cd561024,c08d1985,c08d1998,c04ef349,c0b4e536) at netbsd:mutex_abort+0x42
> mutex_vector_enter(cd561024,2000000,cb191a24,c05e145c,1000000) at netbsd:mutex_vector_enter+0x86
> sched_pstats(0,c19c8418,c18a4200,c1a42000,762cf000) at netbsd:sched_pstats+0x39
> callout_softclock(0,c19b2d00,800,c0b52d00,c180dcc0) at netbsd:callout_softclock+0x52a
> softintr_dispatch(0,cb191b14,cb190011,c05e0031,c01f0011) at netbsd:softintr_dispatch+0x9a
> Xsoftclock() at netbsd:Xsoftclock+0x3c
> --- interrupt ---
> 0x9:
> db> reboot
> rebooting...
> 
> -- Kazushi
> Screw up your courage!  You've screwed up everything else.
-- 
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
     NetBSD: 26 ans d'experience feront toujours la difference
--