NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
The following reply was made to PR kern/38273; it has been noted by GNATS.
From: "Greg A. Woods" <woods%planix.com@localhost>
To: NetBSD GNATS <gnats-bugs%NetBSD.org@localhost>
Cc: NetBSD GNATS Administrator <gnats-admin%NetBSD.org@localhost>,
NetBSD Kernel Technical Discussion List <tech-kern%NetBSD.org@localhost>
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from
ld_ataraid_start_raid0()
Date: Sat, 23 Aug 2008 19:02:11 -0400
--pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
At Fri, 25 Apr 2008 17:15:04 +0000 (UTC), Me-planix.com wrote:
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_atarai=
d_start_raid0()
>=20
> I've been trying my hand at looking deeper at this problem but I'm
> having a difficult time figuring out which lock is which, and at this
> point I'm not even sure if the mutex_vector_enter() in the stack
> backtrace is the same as mutex_enter() in the source or not.
> =20
> The first line in ldstart() is:
> =20
> mutex_enter(&sc->sc_mutex);
> =20
> Then a little bit later, before any mutex_exit(&sc->sc_mutex) there's a
> call, through the sc_start function pointer, to the ld_ataraid_start_rai=
d0()
> routine.
> =20
> The only locking I can see that ld_ataraid_start_raid0() does is:
> =20
> mutex_enter(&cbp->cb_buf.b_vp->v_interlock);
> =20
> Is that the same lock as is used in ldstart(), i.e. the sc_mutex?
> =20
> Interestingly I see that before and after calling biodone(), ldstart()
> releases and then re-acquires the sc_mutex (if I'm interpreting this
> right):
> =20
> mutex_exit(&sc->sc_mutex);
> biodone(bp);
> mutex_enter(&sc->sc_mutex);
> =20
> Should the same be done before calling the sc_start function?
> =20
> Or should ld_ataraid_start_raid0() not be doing any locking at all?
As far as I can tell I haven't seen any reply to this yet.
It's still happening. I hadn't even got this far until today when
Juergen Hannken-Illjes suggested a working fix for my PR# 38636.
Now I'm back to this one. I've CC'ed tech-kern once again to see if
fresh eyes might help spot something obvious.
FYI, here's what the crash looks like today:
Mutex error: lockdebug_barrier: spin lock held
lock address : 0x00000000d185d7ac type : spin
initialized : 0x00000000c01f430c
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0x00000000d1e57380 last held: 0x00000000d1e57380
last locked : 0x00000000c01f3cee unlocked : 0x00000000c01f3d6b
owner field : 0x0000000000010600 wait/spin: 0/1
panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c05ac52c cs 8 eflags 246 cr2 bbbfb000 ilevel 6
Stopped in pid 857.1 (newfs) at netbsd:breakpoint+0x4: popl %ebp
db{0}> trace
breakpoint(c0afbae3,d1c3d8c8,c0b29800,c04e351f,6,1,0,0,d1c3d8c8,8) at netbs=
d:breakpoint+0x4
panic(c0a9eddc,c0a9a5f7,c087af90,c0a9edf5,0,1000001,6,0,0,d1823b80) at netb=
sd:panic+0x1b8
lockdebug_abort1(c0a9edf5,1,0,0,c0aa38ce,d185d6cc,d1c3d92c,c049a1ca,c31f7e6=
0,c0b25fa4) at netbsd:lockdebug_abort1+0xbb
mutex_vector_enter(d1823b80,0,cc4c0000,200,6,0,c01f3cee,c32c5f44,0,efff1749=
) at netbsd:mutex_vector_enter+0x437
ld_ataraid_start_raid0(d185d6cc,c31e860c,d1c3da4c,200,c32cda00,d185d7ac,d18=
5d750,0,c31e860c,d185d6cc) at netbsd:ld_ataraid_start_raid0+0x2e2
ldstart(6,c31e860c,0,0,c04b358b,101,0,d1818830,0,c32cda00) at netbsd:ldstar=
t+0x6e
ldstrategy(c31e860c,200,200,1,0,d181881c,d1818830,d1818834,bbbb5000,d1e5738=
0) at netbsd:ldstrategy+0x171
physio(c01f4770,0,4500,0,c01f3500,d1c3dc5c,d1c3db4c,c04d64b0,4500,d1c3dc5c)=
at netbsd:physio+0x251
ldwrite(4500,d1c3dc5c,10,8,d1b09720,d1c3dc5c,6,d1e57380,d1c3dbe4,d1b09680) =
at netbsd:ldwrite+0x35
cdev_write(4500,d1c3dc5c,10,2,d1b09720,d17fd000,d1c3db8c,c0522bf7,d1b09720,=
1) at netbsd:cdev_write+0x70
spec_write(d1c3dbe4,bbbf8000,c087c740,d1b09680,2,20002,d1c3dbfc,c052e058,c0=
87c240,d1b09680) at netbsd:spec_write+0xa0
VOP_WRITE(d1b09680,d1c3dc5c,10,cc4a6a80,0,0,2,16,200,bbbb5000) at netbsd:VO=
P_WRITE+0x6c
vn_write(d1e1c980,d1c3dcc4,d1c3dc5c,cc4a6a80,0,ffffffff,d1c3dc8c,c053632c,d=
1c3dc6c,d1e1c900) at netbsd:vn_write+0xb1
dofilewrite(4,d1e1c980,bbbb5000,200,d1c3dcc4,0,d1c3dd28,c05b5b7f,0,0) at ne=
tbsd:dofilewrite+0x75
sys_pwrite(d1e57380,d1c3dd00,d1c3dd28,bbbfb000,bbbfb000,d1ea2dd8,2,4,bbbb50=
00,200) at netbsd:sys_pwrite+0xc7
syscall(d1c3dd48,b3,ab,1f,1f,0,1749efff,bfbfc8b8,0,0) at netbsd:syscall+0xab
db{0}> x/I 0x00000000c01f3cee
netbsd:ldstart+0x1e: testl %esi,%esi
db{0}> x/I 0x00000000d1e57380
0xd1e57380: addb %al,0(%eax)
db{0}> x/I 0x00000000c01f3d6b
netbsd:ldstart+0x9b: addl $0x1c,%esp
db{0}> x/I 0x00000000c01f430c
netbsd:ldattach+0x2c: testb $0x1,0x128(%edi)
db{0}> call simple_lock_dump
Symbol not found
db{0}>=20
--=20
Greg A. Woods
Planix, Inc.
<woods%planix.com@localhost> +1 416 489-5852 x122
http://www.planix.com/
--pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: WAD0I7+JuJmcRZ3NO/I9I2k7ym7/7sex
iQA/AwUBSLCW82Z9cbd4v/R/EQL0PwCg6PrcbklyGS1H/KIXC6FnCzG6GXsAniYm
QajjB4l9wN49CLthK2s2RIg+
=F7Nt
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1--
Home |
Main Index |
Thread Index |
Old Index