Re: kernel_lock, splbio() and SMP_SAFE

To: tech-kern%NetBSD.org@localhost
Subject: Re: kernel_lock, splbio() and SMP_SAFE
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Mon, 3 Aug 2009 17:27:57 +0200

On Mon, Aug 03, 2009 at 01:37:56PM +0200, Manuel Bouyer wrote:
> Hello,
> there's one thing I didn't get in the new world order of the NetBSD
> kernel (on i386, FWIW):
> given a low-level subsystem (say, a driver) which is not SMP-safe
> and is using the old spl() locking scheme, how is this subsystem
> protected from SMP-save upper-level subsystems calling one of its
> entry point ? I would expect kernel_lock() to be taken at some
> point, but I can't find where it's happening in this case.
> 
> I added
> KASSERT(__SIMPLELOCK_LOCKED_P(kernel_lock));
> in wdstart() and sdstart(), and ended up with:
> root on raid0a dumps on raid0b
> panic: kernel diagnostic assertion "__SIMPLELOCK_LOCKED_P(kernel_lock)" 
> failed: file "/dsk/l1/misc/bouyer/netbsd-5/src/sys/dev/ata/wd.c", line 620
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 eip c03f26dc cs 8 eflags 246 cr2 0 ilevel 6
> Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
> db{1}>  tr
> breakpoint(c0646f16,c07c69d4,c2cf2800,0,2,0,c07c6a08,c035190f,5f,0) at 
> netbsd:breakpoint+0x4
> panic(c068fb90,c061b30d,c065b8f8,c065c99c,26c,0,c07c6a08,c041bf45,c061b30d,c065c99c)
>  at netbsd:panic+0x1b0
> __kernassert(c061b30d,c065c99c,26c,c065b8f8,0,c2e68a1c,c07c6a48,c041c437,ce31a7d0,c2e68a1c)
>  at netbsd:__kernassert+0x39
> wdstart(ce31a7d0,c2e68a1c,1,0,6,8,c07c6a48,400,5f,0) at netbsd:wdstart+0xf5
> wdstrategy(c2e68a1c,c0713fd8,c07c6a98,0,0,c2df8000,c07c6c38,c0186c9a,8,ce322678)
>  at netbsd:wdstrategy+0x1c7
> raidread_component_label(8,ce322678,c07c6aa4,c0315077,0,c069ff20,0,0,4,4) at 
> netbsd:raidread_component_label+0x69
> rf_markalldirty(1203,c018a550,c2df5200,c2df5000,c070bf80,c0709140,cd5ee240,c2cf2918,c0709140,4)
>  at netbsd:rf_markalldirty+0x9a
> raidopen(1201,0,6000,c069ff20,0,c2df7000,0,0,14,0) at netbsd:raidopen+0x240
> raidsize(1201,0,0,0,0,14,c07c6d38,c030a04b,0,0) at netbsd:raidsize+0x102
> cpu_dumpconf(0,0,14,0,0,c03098a0,0,0,c07090b8,0) at netbsd:cpu_dumpconf+0x37
> main(0,c01002a7,0,0,0,0,0,0,0,0) at netbsd:main+0x28b
> 
> I can't believe we have such a big synchronisation issue in our kernel
> that would have been unnoticed for so long. Can someone explain me how this is
> supposed to work ? 

This seems to happen only before starting init, so it doesn't cause big
harms. remplacing the KASSERT with a call to debugger() shows that in
multiuser, wdstart and sdstart are called with kernel_lock held. 

-- 
Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
     NetBSD: 26 ans d'experience feront toujours la difference
--

Follow-Ups:
- Re: kernel_lock, splbio() and SMP_SAFE
  - From: Mindaugas Rasiukevicius

References:
- kernel_lock, splbio() and SMP_SAFE
  - From: Manuel Bouyer

Prev by Date: Re: Support for multi-position electro-mechanical keylocks
Next by Date: Re: kernel_lock, splbio() and SMP_SAFE
Previous by Thread: kernel_lock, splbio() and SMP_SAFE
Next by Thread: Re: kernel_lock, splbio() and SMP_SAFE
Indexes:

Home | Main Index | Thread Index | Old Index