NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()



The following reply was made to PR kern/38273; it has been noted by GNATS.

From: "Greg A. Woods" <woods%planix.com@localhost>
To: NetBSD GNATS <gnats-bugs%NetBSD.org@localhost>
Cc: NetBSD GNATS Administrator <gnats-admin%NetBSD.org@localhost>,
    NetBSD Kernel Technical Discussion List <tech-kern%NetBSD.org@localhost>
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from 
ld_ataraid_start_raid0()
Date: Sat, 23 Aug 2008 19:02:11 -0400

 --pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable
 
 At Fri, 25 Apr 2008 17:15:04 +0000 (UTC), Me-planix.com wrote:
 Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_atarai=
 d_start_raid0()
 >=20
 >  I've been trying my hand at looking deeper at this problem but I'm
 >  having a difficult time figuring out which lock is which, and at this
 >  point I'm not even sure if the mutex_vector_enter() in the stack
 >  backtrace is the same as mutex_enter() in the source or not.
 > =20
 >  The first line in ldstart() is:
 > =20
 >      mutex_enter(&sc->sc_mutex);
 > =20
 >  Then a little bit later, before any mutex_exit(&sc->sc_mutex) there's a
 >  call, through the sc_start function pointer, to the ld_ataraid_start_rai=
 d0()
 >  routine.
 > =20
 >  The only locking I can see that ld_ataraid_start_raid0() does is:
 > =20
 >                      mutex_enter(&cbp->cb_buf.b_vp->v_interlock);
 > =20
 >  Is that the same lock as is used in ldstart(), i.e. the sc_mutex?
 > =20
 >  Interestingly I see that before and after calling biodone(), ldstart()
 >  releases and then re-acquires the sc_mutex (if I'm interpreting this
 >  right):
 > =20
 >                              mutex_exit(&sc->sc_mutex);
 >                              biodone(bp);
 >                              mutex_enter(&sc->sc_mutex);
 > =20
 >  Should the same be done before calling the sc_start function?
 > =20
 >  Or should ld_ataraid_start_raid0() not be doing any locking at all?
 
 As far as I can tell I haven't seen any reply to this yet.
 
 It's still happening.  I hadn't even got this far until today when
 Juergen Hannken-Illjes suggested a working fix for my PR# 38636.
 
 Now I'm back to this one.  I've CC'ed tech-kern once again to see if
 fresh eyes might help spot something obvious.
 
 FYI, here's what the crash looks like today:
 
 Mutex error: lockdebug_barrier: spin lock held
 
 lock address : 0x00000000d185d7ac type     :               spin
 initialized  : 0x00000000c01f430c
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  0 last held:                  0
 current lwp  : 0x00000000d1e57380 last held: 0x00000000d1e57380
 last locked  : 0x00000000c01f3cee unlocked : 0x00000000c01f3d6b
 owner field  : 0x0000000000010600 wait/spin:                0/1
 
 panic: LOCKDEBUG
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ac52c cs 8 eflags 246 cr2 bbbfb000 ilevel 6
 Stopped in pid 857.1 (newfs) at netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> trace
 breakpoint(c0afbae3,d1c3d8c8,c0b29800,c04e351f,6,1,0,0,d1c3d8c8,8) at netbs=
 d:breakpoint+0x4
 panic(c0a9eddc,c0a9a5f7,c087af90,c0a9edf5,0,1000001,6,0,0,d1823b80) at netb=
 sd:panic+0x1b8
 lockdebug_abort1(c0a9edf5,1,0,0,c0aa38ce,d185d6cc,d1c3d92c,c049a1ca,c31f7e6=
 0,c0b25fa4) at netbsd:lockdebug_abort1+0xbb
 mutex_vector_enter(d1823b80,0,cc4c0000,200,6,0,c01f3cee,c32c5f44,0,efff1749=
 ) at netbsd:mutex_vector_enter+0x437
 ld_ataraid_start_raid0(d185d6cc,c31e860c,d1c3da4c,200,c32cda00,d185d7ac,d18=
 5d750,0,c31e860c,d185d6cc) at netbsd:ld_ataraid_start_raid0+0x2e2
 ldstart(6,c31e860c,0,0,c04b358b,101,0,d1818830,0,c32cda00) at netbsd:ldstar=
 t+0x6e
 ldstrategy(c31e860c,200,200,1,0,d181881c,d1818830,d1818834,bbbb5000,d1e5738=
 0) at netbsd:ldstrategy+0x171
 physio(c01f4770,0,4500,0,c01f3500,d1c3dc5c,d1c3db4c,c04d64b0,4500,d1c3dc5c)=
  at netbsd:physio+0x251
 ldwrite(4500,d1c3dc5c,10,8,d1b09720,d1c3dc5c,6,d1e57380,d1c3dbe4,d1b09680) =
 at netbsd:ldwrite+0x35
 cdev_write(4500,d1c3dc5c,10,2,d1b09720,d17fd000,d1c3db8c,c0522bf7,d1b09720,=
 1) at netbsd:cdev_write+0x70
 spec_write(d1c3dbe4,bbbf8000,c087c740,d1b09680,2,20002,d1c3dbfc,c052e058,c0=
 87c240,d1b09680) at netbsd:spec_write+0xa0
 VOP_WRITE(d1b09680,d1c3dc5c,10,cc4a6a80,0,0,2,16,200,bbbb5000) at netbsd:VO=
 P_WRITE+0x6c
 vn_write(d1e1c980,d1c3dcc4,d1c3dc5c,cc4a6a80,0,ffffffff,d1c3dc8c,c053632c,d=
 1c3dc6c,d1e1c900) at netbsd:vn_write+0xb1
 dofilewrite(4,d1e1c980,bbbb5000,200,d1c3dcc4,0,d1c3dd28,c05b5b7f,0,0) at ne=
 tbsd:dofilewrite+0x75
 sys_pwrite(d1e57380,d1c3dd00,d1c3dd28,bbbfb000,bbbfb000,d1ea2dd8,2,4,bbbb50=
 00,200) at netbsd:sys_pwrite+0xc7
 syscall(d1c3dd48,b3,ab,1f,1f,0,1749efff,bfbfc8b8,0,0) at netbsd:syscall+0xab
 db{0}> x/I 0x00000000c01f3cee
 netbsd:ldstart+0x1e:    testl   %esi,%esi
 db{0}> x/I 0x00000000d1e57380
 0xd1e57380:     addb    %al,0(%eax)
 db{0}> x/I 0x00000000c01f3d6b
 netbsd:ldstart+0x9b:    addl    $0x1c,%esp
 db{0}> x/I 0x00000000c01f430c
 netbsd:ldattach+0x2c:   testb   $0x1,0x128(%edi)
 db{0}> call simple_lock_dump
 Symbol not found
 db{0}>=20
 
 --=20
                                                Greg A. Woods
                                                Planix, Inc.
 
 <woods%planix.com@localhost>     +1 416 489-5852 x122     
http://www.planix.com/
 
 --pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 
 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: WAD0I7+JuJmcRZ3NO/I9I2k7ym7/7sex
 
 iQA/AwUBSLCW82Z9cbd4v/R/EQL0PwCg6PrcbklyGS1H/KIXC6FnCzG6GXsAniYm
 QajjB4l9wN49CLthK2s2RIg+
 =F7Nt
 -----END PGP SIGNATURE-----
 
 --pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1--
 


Home | Main Index | Thread Index | Old Index