NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/37718: RAIDframe regression after vmlocking2 merge



>Number:         37718
>Category:       kern
>Synopsis:       reconstruct-in-place no longer works after vmlocking2 merge
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jan 08 15:55:01 +0000 2008
>Originator:     oster%netbsd.org@localhost
>Release:        NetBSD 4.99.48
>Organization:
>Environment:
        
        
System: NetBSD 4.99.48 (RAIDFRAME.ddbLD) #6: Mon Jan  7 17:01:28 CST 2008
        oster@quad:/u1/devel/current/src/sys/arch/i386/compile/RAIDFRAME.ddbLD
Architecture: i386
Machine: i386
>Description:
        Attempt to do a reconstruct-in-place of a failed (or
non-failed) component.  Watch the machine keel over as follows:

rizzo# raidctl -vR /dev/sd3f raid1
Reconstruction suvm_fault(0xc0ae75a0, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 0.33 (system) at netbsd:turnstile_block+0x165:   movl    0x10(%eb
x),%ea
db> tr
turnstile_block(0,1,ca13a330,c0a4a744,0) at netbsd:turnstile_block+0x165
mutex_vector_enter(ca13a330,1ed,0,0,10) at netbsd:mutex_vector_enter+0xf9
vget(ca13a330,10,cb45f72c,c04c050b,cb45f71c) at netbsd:vget+0x17d
cache_lookup(ca13bbf8,cb45fa7c,cb45fa90,c13d8168,0) at netbsd:cache_lookup+0xf7
ufs_lookup(cb45f814,ca13bd90,cb45f82c,c04b2226,c07de720) at 
netbsd:ufs_lookup+0xcc
VOP_LOOKUP(ca13bbf8,cb45fa7c,cb45fa90,20002,ca13bd90) at netbsd:VOP_LOOKUP+0x2d
lookup(cb45fa68,20002,400,cb45fa84,0) at netbsd:lookup+0x20b
namei(cb45fa68,0,cb45fa6c,1,0) at netbsd:namei+0x145
vn_open(cb45fa68,3,0,ca13a330,ffffffff) at netbsd:vn_open+0x71
dk_lookup(c0f7c800,cb479820,cb45fcfc,1,0) at netbsd:dk_lookup+0x5a
rf_ReconstructInPlace(c0f25000,0,c13c55e0,c13c55e0,c01dd4b0) at 
netbsd:rf_ReconstructInPlace+0x18d
rf_ReconstructInPlaceThread(c13c55e0,0,c01002bd,0,c01002bd) at 
netbsd:rf_ReconstructInPlaceThread+0x3d
db> show reg
ds          0x10
es          0x10
fs          0x30
gs          0x10
edi         0xcb479820
esi         0xc09c688b  copyright+0x43f4b
ebp         0xcb45f68c
ebx         0xfffffff0
edx         0xc0ae8fa0  turnstile_tab+0x4c0
ecx         0xcb479820
eax         0xfffffff0
eip         0xc0469375  turnstile_block+0x165
cs          0x8
eflags      0x10287
esp         0xcb45f654
ss          0x10
netbsd:turnstile_block+0x165:   movl    0x10(%ebx),%eax
db>

boot with a LOCKDEBUG kernel, and attempt the same reconstruct.  See
the following:


rizzo# raidctl -vR /dev/sd3f raid1
Reconstruction sMutex error: lockdebug_barrier: spin lock held

lock address : 0x00000000c0af5fe0 type     :               spin
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  0
current cpu  :                  0 last held:                  0
current lwp  : 0x00000000cb519820 last held: 0x00000000cb519820
last locked  : 0x00000000c04714a6 unlocked : 000000000000000000
initialized  : 0x00000000c0471556
owner field  : 0x0000000000010600 wait/spin:                0/1

panic: LOCKDEBUG
Stopped in pid 0.33 (system) at netbsd:breakpoint+0x1:  ret
db> tr
breakpoint(c09d4b0b,c09d0e0b,c07e775c,c09d4b2d,c0b00400) at 
netbsd:breakpoint+0x1
lockdebug_abort1(c09d4b2d,1,c0b003c0,c047bdd4,1) at netbsd:lockdebug_abort1+0x6b
lockdebug_barrier(c0af38a0,1,1,0,c047cb2e) at netbsd:lockdebug_barrier+0xdd
rw_vector_enter(c0af45e4,0,0,80000000,0) at netbsd:rw_vector_enter+0x1f3
vm_map_lock_read(c0af45e0,c0ad5b48,c0ad18a0,c0b003c0,c047bdd4) at 
netbsd:vm_map_lock_read+0x21
uvm_fault_internal(c0af45e0,0,1,0,0) at netbsd:uvm_fault_internal+0xa2
trap() at netbsd:trap+0x6de
--- trap (number 6) ---
turnstile_block(0,1,ca15a330,c0a57744,4) at netbsd:turnstile_block+0x185
mutex_vector_enter(ca15a330,1ed,0,ca15bbf8,10) at 
netbsd:mutex_vector_enter+0x159
vget(ca15a330,10,cb52f72c,c04c9e8b,cb52f71c) at netbsd:vget+0x17d
cache_lookup(ca15bbf8,cb52fa7c,cb52fa90,c047bdd4,5) at netbsd:cache_lookup+0xf7
ufs_lookup(cb52f814,ca15bd90,cb52f82c,c04bbb96,c07e8820) at 
netbsd:ufs_lookup+0xcc
VOP_LOOKUP(ca15bbf8,cb52fa7c,cb52fa90,20002,ca15bd90) at netbsd:VOP_LOOKUP+0x2d
lookup(cb52fa68,20002,400,cb52fa84,c0ad5b48) at netbsd:lookup+0x20b
namei(cb52fa68,cb35c3c0,0,c047c8be,c0ad5b48) at netbsd:namei+0x145
vn_open(cb52fa68,3,0,c047bd6d,c0ad5b48) at netbsd:vn_open+0x71
dk_lookup(c1223800,ca14c400,cb52fcfc,1,6) at netbsd:dk_lookup+0x5a
rf_ReconstructInPlace(c0f3c000,0,c13bc660,c13bc660,c01e0fe0) at 
netbsd:rf_ReconstructInPlace+0x1ef
rf_ReconstructInPlaceThread(c13bc660,0,c01002bd,0,c01002bd) at 
netbsd:rf_ReconstructInPlaceThread+0x3d
db> 

This problem does not exist in 4.99.47.  That kernel on the same box
under the exact circumstances works just fine.

This problem is very repeatable on my test box.  Additional
information available upon request.

>How-To-Repeat:
        run 'raidctl -vR /dev/sd3f raid1' where 'sd3f' is a component
of RAID set 'raid1'.

>Fix:
        PLEASE! :)





Home | Main Index | Thread Index | Old Index