NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/38184: we need a work-around to prevent crashes for what has been described as a design defect in the SA thread implementation



>Number:         38184
>Category:       kern
>Synopsis:       we need a work-around to prevent crashes for what has been 
>described as a design defect in the SA thread implementation
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 06 18:00:00 +0000 2008
>Originator:     Greg A. Woods
>Release:        NetBSD 4.0_STABLE 2008/03/03
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD 4.0_STABLE GENERIC.MP
Architecture: i386
Machine: i386
>Description:

        Andrew Doran described the crash I reported in a reply to PR#
        37993 as an unrelated issue.  He suggested that it might be
        caused by a design defect in the SA thread implemetation which
        is still used on the netbsd-4 (and earlier) branch.

        He says the problem is fixed in -current with 1:1 threads.

        However I'm guessing that a pullup of the 1:1 threading change
        would be far too drastic and problematic to expect to do on any
        existing release branch, such as netbsd-4.

        Unfortunately this problem is causing major headaches for me.
        My own fileserver is crashing nightly, usually it seems due to
        this problem, and as a result I'm unable to recommend upgrades
        to customers running NetBSD-3.0 and earlier on SMP boxes.

        On my machine the problem seems to occur regularly with the
        nightly cron jobs.  With LOCKDEBUG I'm now getting repeated
        identical crash results.

        This problem may even be related to similar, but far more
        occasional, crashes that I've seen on AlphaServer boxes running
        NetBSD-1.6.2.

        In fact in a cursory google search of similar kinds of symptoms
        it seems that many people may be reporting related problems on
        the various mailing lists.  Many reports don't even seem to have
        been turned into official PRs.

        It would be really helpful to be able to find a fix or at least
        some kind of reliable work-around to this problem which doesn't
        involve doing without "options MULTIPROCESSOR" until NetBSD-5.x
        finally comes along and is ready for production use.

        First from my PR# 37993 attachment:

simple_lock: locking against myself
lock: 0xc0a50924, currently at: 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.h:417
on CPU 0
last locked: /rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
last unlocked: /rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1011
kernel_map_store(c09822f4,3f303fd,0,c0a67284,0) at 0xc0a50820
Stopped in pid 2264.2 (apcupsd) at      netbsd:cpu_Debugger+0x4:        popl    
%ebp
db{0}> trace
cpu_Debugger(d98c25d0,1,ffff,c0959b17,c041c0e0) at netbsd:cpu_Debugger+0x4
_simple_lock(c0a50924,c09818f0,1a1,d98c2680,18c2704) at 
netbsd:_simple_lock+0x331
uvm_map_prepare(c0a50820,c0000000,20000,0,ffffffff) at 
netbsd:uvm_map_prepare+0x329
uvm_map(c0a50820,d98c26e4,20000,0,ffffffff) at netbsd:uvm_map+0xc0
km_vacache_alloc(c0a50980,0,525,206,c09d9bf4) at netbsd:km_vacache_alloc+0x67
pool_grow(c0a509f0,c0988ec4,3a7,3a4,977) at netbsd:pool_grow+0x4c
pool_get(c0a50980,0,d98c279c,202,d98c279c) at netbsd:pool_get+0x12f
uvm_km_alloc_poolpage_cache(c0a50820,0,525,206,c09d9bf4) at 
netbsd:uvm_km_alloc_poolpage_cache+0x59
pool_grow(c0a51ef0,c0988ec4,3a7,3a4,d992) at netbsd:pool_grow+0x4c
pool_get(c0a51e80,0,d98c285c,c0433942,0) at netbsd:pool_get+0x12f
sadata_upcall_alloc(0,58,d98c285c,202,c0a5094c) at 
netbsd:sadata_upcall_alloc+0x21
ltsleep(c0a5087c,4,c092e1f9,0,c0a50924) at netbsd:ltsleep+0x552
uvm_map_prepare(c0a50820,c0000000,20000,0,ffffffff) at 
netbsd:uvm_map_prepare+0x1b4
uvm_map(c0a50820,d98c2964,20000,0,ffffffff) at netbsd:uvm_map+0xc0
km_vacache_alloc(c0a50980,2,525,202,c09d9bf4) at netbsd:km_vacache_alloc+0x67
pool_grow(c0a509f0,c0988ec4,3a7,3a4,cffe7594) at netbsd:pool_grow+0x4c
pool_get(c0a50980,2,0,202,0) at netbsd:pool_get+0x12f
uvm_km_alloc_poolpage_cache(c0a50820,1,525,202,c09d9bf4) at 
netbsd:uvm_km_alloc_poolpage_cache+0x59
pool_grow(c0a6ccf0,c0988ec4,3a7,3a4,0) at netbsd:pool_grow+0x4c
pool_get(c0a6cc80,2,80f,3a4,d002c694) at netbsd:pool_get+0x12f
pool_cache_get_paddr(c0a6cb20,2,0,0,0) at netbsd:pool_cache_get_paddr+0x15b
pmap_create(d002c794,0,bfc00000,41,d002c690) at netbsd:pmap_create+0xe2
uvmspace_init(d002c690,0,0,bfc00000,d9848f80) at netbsd:uvmspace_init+0x75
uvmspace_alloc(0,bfc00000,d9848f80,c09818f0,1c2) at netbsd:uvmspace_alloc+0x3a
uvmspace_fork(d9848e7c,d97f2204,48,d98b9020,d98b9020) at 
netbsd:uvmspace_fork+0x116
uvm_proc_fork(d97f2204,d98b9020,0,246,c09d9bf4) at netbsd:uvm_proc_fork+0x23
fork1(d97ef4bc,0,14,0,0) at netbsd:fork1+0x2f9
sys_fork(d97ef4bc,d98c2c48,d98c2c68,813811f,8138000) at netbsd:sys_fork+0x45
syscall_plain() at netbsd:syscall_plain+0x1a8
--- syscall (number 2) ---
0x80f6613:
db{0}> call simple_lock_dump()
all simple locks:
0xc0a67274 CPU 0 /rest/work/woods/m-NetBSD-4/sys/kern/kern_lock.c:1476
0xc0a50924 CPU 0 /rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
0x80000000
db{0}>

        Now from the past two nights:

[Wed Mar  5 04:27:13 2008]simple_lock: locking against myself
[Wed Mar  5 04:27:13 2008]lock: 0xc0a50924, currently at: 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.h:417
[Wed Mar  5 04:27:13 2008]on CPU 7
[Wed Mar  5 04:27:13 2008]last locked: 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
[Wed Mar  5 04:27:13 2008]last unlocked: 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1011
[Wed Mar  5 04:27:13 2008]once 
apcupsd[223kernel_map_store(c09822f4,3f303fd,c68f1f38,c0a67284,7) at 7]: Power 
failur0xc0a50820
[Wed Mar  5 04:27:13 2008]e.
[Wed Mar  5 04:27:13 2008]Stopped in pid 2237.1 (apcupsd) at      
netbsd:cpu_Debugger+0x4:        popl    %ebp
[Wed Mar  5 04:27:13 2008]db{7}> trace
[Wed Mar  5 10:20:23 2008]cpu_Debugger(d97755d0,1,ffff,c0959b17,c041c0e0) at 
netbsd:cpu_Debugger+0x4
[Wed Mar  5 10:20:23 2008]_simple_lock(c0a50924,c09818f0,1a1,d9775680,1775704) 
at netbsd:_simple_lock+0x331
[Wed Mar  5 10:20:24 2008]uvm_map_prepare(c0a50820,c0000000,20000,0,ffffffff) 
at netbsd:uvm_map_prepare+0x329
[Wed Mar  5 10:20:24 2008]uvm_map(c0a50820,d97756e4,20000,0,ffffffff) at 
netbsd:uvm_map+0xc0
[Wed Mar  5 10:20:24 2008]km_vacache_alloc(c0a50980,0,525,206,c09d9bf4) at 
netbsd:km_vacache_alloc+0x67
[Wed Mar  5 10:20:24 2008]pool_grow(c0a509f0,c0988ec4,3a7,3a4,26dc) at 
netbsd:pool_grow+0x4c
[Wed Mar  5 10:20:24 2008]pool_get(c0a50980,0,d977579c,202,d977579c) at 
netbsd:pool_get+0x12f
[Wed Mar  5 10:20:24 
2008]uvm_km_alloc_poolpage_cache(c0a50820,0,525,206,c09d9bf4) at 
netbsd:uvm_km_alloc_poolpage_cache+0x59
[Wed Mar  5 10:20:24 2008]pool_grow(c0a51ef0,c0988ec4,3a7,3a4,4b2c) at 
netbsd:pool_grow+0x4c
[Wed Mar  5 10:20:24 2008]pool_get(c0a51e80,0,d977585c,c0433942,0) at 
netbsd:pool_get+0x12f
[Wed Mar  5 10:20:24 2008]sadata_upcall_alloc(0,58,d977585c,202,c0a5094c) at 
netbsd:sadata_upcall_alloc+0x21
[Wed Mar  5 10:20:24 2008]ltsleep(c0a5087c,4,c092e1f9,0,c0a50924) at 
netbsd:ltsleep+0x552
[Wed Mar  5 10:20:24 2008]uvm_map_prepare(c0a50820,c0000000,20000,0,ffffffff) 
at netbsd:uvm_map_prepare+0x1b4
[Wed Mar  5 10:20:24 2008]uvm_map(c0a50820,d9775964,20000,0,ffffffff) at 
netbsd:uvm_map+0xc0
[Wed Mar  5 10:20:24 2008]km_vacache_alloc(c0a50980,2,525,202,c09d9bf4) at 
netbsd:km_vacache_alloc+0x67
[Wed Mar  5 10:20:24 2008]pool_grow(c0a509f0,c0988ec4,3a7,3a4,d9017000) at 
netbsd:pool_grow+0x4c
[Wed Mar  5 10:20:24 2008]pool_get(c0a50980,2,d9775b0c,202,d0020f04) at 
netbsd:pool_get+0x12f
[Wed Mar  5 10:20:24 
2008]uvm_km_alloc_poolpage_cache(c0a50820,1,525,202,c09d9bf4) at 
netbsd:uvm_km_alloc_poolpage_cache+0x59
[Wed Mar  5 10:20:24 2008]pool_grow(c0a6ccf0,c0988ec4,3a7,3a4,0) at 
netbsd:pool_grow+0x4c
[Wed Mar  5 10:20:24 2008]pool_get(c0a6cc80,2,80f,3a4,d98aa7f4) at 
netbsd:pool_get+0x12f
[Wed Mar  5 10:20:24 2008]pool_cache_get_paddr(c0a6cb20,2,0,0,0) at 
netbsd:pool_cache_get_paddr+0x15b
[Wed Mar  5 10:20:24 2008]pmap_create(d98aa8f4,0,bfc00000,41,d98aa7f0) at 
netbsd:pmap_create+0xe2
[Wed Mar  5 10:20:24 2008]uvmspace_init(d98aa7f0,0,0,bfc00000,d9787f80) at 
netbsd:uvmspace_init+0x75
[Wed Mar  5 10:20:24 2008]uvmspace_alloc(0,bfc00000,d9787f80,c09818f0,1c2) at 
netbsd:uvmspace_alloc+0x3a
[Wed Mar  5 10:20:24 2008]uvmspace_fork(d9787e7c,d970d018,48,d9be37dc,d9be37dc) 
at netbsd:uvmspace_fork+0x116
[Wed Mar  5 10:20:24 2008]uvm_proc_fork(d970d018,d9be37dc,0,d9775bc4,1) at 
netbsd:uvm_proc_fork+0x23
[Wed Mar  5 10:20:24 2008]fork1(d970a8c4,0,14,0,0) at netbsd:fork1+0x2f9
[Wed Mar  5 10:20:24 2008]sys_fork(d970a8c4,d9775c48,d9775c68,d9775c50,246) at 
netbsd:sys_fork+0x45
[Wed Mar  5 10:20:24 2008]syscall_plain() at netbsd:syscall_plain+0x1a8
[Wed Mar  5 10:20:24 2008]--- syscall (number 2) ---
[Wed Mar  5 10:20:24 2008]0x80f6613:
[Wed Mar  5 10:20:24 2008]db{7}> call simple_lock_dump
[Wed Mar  5 10:20:42 2008]all simple locks:
[Wed Mar  5 10:20:42 2008]0xc0a67274 CPU 7 
/rest/work/woods/m-NetBSD-4/sys/kern/kern_lock.c:1476
[Wed Mar  5 10:20:42 2008]0xc0a50924 CPU 7 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
[Wed Mar  5 10:20:42 2008]0x80000000
[Wed Mar  5 10:20:42 2008]db{7}> reboot
[Wed Mar  5 10:22:29 2008]syncing disks... 
[Wed Mar  5 10:22:30 2008]switching with held simple_lock 0xc0a50924 CPU 7 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
[Wed Mar  5 10:22:30 2008]copyright(0,0,c5e7c800,d970a8c4,0) at 0xc098778c
[Wed Mar  5 10:22:30 2008]Bad frame pointer: 0xc678cc40
[Wed Mar  5 10:22:30 2008]Stopped in pid 2237.1 (apcupsd) at      
netbsd:cpu_Debugger+0x4:        popl    %ebp
[Wed Mar  5 10:22:30 2008]db{7}> reboot
[Wed Mar  5 10:22:39 2008]rebooting...



[Thu Mar  6 03:46:59 2008]simple_lock: locking against myself
[Thu Mar  6 03:46:59 2008]lock: 0xc0a54f04, currently at: 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.h:417
[Thu Mar  6 03:46:59 2008]on CPU 7
[Thu Mar  6 03:46:59 2008]last locked: 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
[Thu Mar  6 03:46:59 2008]last unlocked: 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1011
[Thu Mar  6 03:46:59 2008]once 
apcupsd[181kernel_map_store(c0983e28,3f303fd,c68f6f38,c0a6b864,7) at 2]: Power 
failur0xc0a54e00
[Thu Mar  6 03:46:59 2008]e.
[Thu Mar  6 03:46:59 2008]Stopped in pid 1812.2 (apcupsd) at      
netbsd:cpu_Debugger+0x4:        popl    %ebp
[Thu Mar  6 03:46:59 2008]db{7}> trace
[Thu Mar  6 11:15:36 2008]cpu_Debugger(d97025d0,1,ffff,c095b5c7,c041cfe0) at 
netbsd:cpu_Debugger+0x4
[Thu Mar  6 11:15:36 2008]_simple_lock(c0a54f04,c0983424,1a1,d9702680,1702704) 
at netbsd:_simple_lock+0x331
[Thu Mar  6 11:15:36 2008]uvm_map_prepare(c0a54e00,c0000000,20000,0,ffffffff) 
at netbsd:uvm_map_prepare+0x329
[Thu Mar  6 11:15:36 2008]uvm_map(c0a54e00,d97026e4,20000,0,ffffffff) at 
netbsd:uvm_map+0xc0
[Thu Mar  6 11:15:36 2008]km_vacache_alloc(c0a54f60,0,525,206,c09de194) at 
netbsd:km_vacache_alloc+0x67
[Thu Mar  6 11:15:36 2008]pool_grow(c0a54fd0,c098aa48,3a7,3a4,209f) at 
netbsd:pool_grow+0x4c
[Thu Mar  6 11:15:36 2008]pool_get(c0a54f60,0,d970279c,202,d970279c) at 
netbsd:pool_get+0x12f
[Thu Mar  6 11:15:36 
2008]uvm_km_alloc_poolpage_cache(c0a54e00,0,525,206,c09de194) at 
netbsd:uvm_km_alloc_poolpage_cache+0x59
[Thu Mar  6 11:15:36 2008]pool_grow(c0a564d0,c098aa48,3a7,3a4,cb0fc) at 
netbsd:pool_grow+0x4c
[Thu Mar  6 11:15:36 2008]pool_get(c0a56460,0,d970285c,c0434832,0) at 
netbsd:pool_get+0x12f
[Thu Mar  6 11:15:36 2008]sadata_upcall_alloc(0,58,d970285c,202,c0a54f2c) at 
netbsd:sadata_upcall_alloc+0x21
[Thu Mar  6 11:15:36 2008]ltsleep(c0a54e5c,4,c092fb29,0,c0a54f04) at 
netbsd:ltsleep+0x552
[Thu Mar  6 11:15:36 2008]uvm_map_prepare(c0a54e00,c0000000,20000,0,ffffffff) 
at netbsd:uvm_map_prepare+0x1b4
[Thu Mar  6 11:15:36 2008]uvm_map(c0a54e00,d9702964,20000,0,ffffffff) at 
netbsd:uvm_map+0xc0
[Thu Mar  6 11:15:36 2008]km_vacache_alloc(c0a54f60,2,525,202,c09de194) at 
netbsd:km_vacache_alloc+0x67
[Thu Mar  6 11:15:36 2008]pool_grow(c0a54fd0,c098aa48,3a7,3a4,d8f5c000) at 
netbsd:pool_grow+0x4c
[Thu Mar  6 11:15:36 2008]pool_get(c0a54f60,2,d9702b0c,202,d0020f04) at 
netbsd:pool_get+0x12f
[Thu Mar  6 11:15:36 
2008]uvm_km_alloc_poolpage_cache(c0a54e00,1,525,202,c09de194) at 
netbsd:uvm_km_alloc_poolpage_cache+0x59
[Thu Mar  6 11:15:36 2008]pool_grow(c0a712d0,c098aa48,3a7,3a4,0) at 
netbsd:pool_grow+0x4c
[Thu Mar  6 11:15:36 2008]pool_get(c0a71260,2,80f,3a4,d99d1698) at 
netbsd:pool_get+0x12f
[Thu Mar  6 11:15:36 2008]pool_cache_get_paddr(c0a71100,2,0,0,0) at 
netbsd:pool_cache_get_paddr+0x15b
[Thu Mar  6 11:15:36 2008]pmap_create(d99d1798,0,bfc00000,41,d99d1694) at 
netbsd:pmap_create+0xe2
[Thu Mar  6 11:15:36 2008]uvmspace_init(d99d1694,0,0,bfc00000,d967bf80) at 
netbsd:uvmspace_init+0x75
[Thu Mar  6 11:15:36 2008]uvmspace_alloc(0,bfc00000,d967bf80,c0983424,1c2) at 
netbsd:uvmspace_alloc+0x3a
[Thu Mar  6 11:15:36 2008]uvmspace_fork(d967be7c,d9661204,48,de545440,de545440) 
at netbsd:uvmspace_fork+0x116
[Thu Mar  6 11:15:36 2008]uvm_proc_fork(d9661204,de545440,0,d9702bc4,1) at 
netbsd:uvm_proc_fork+0x23
[Thu Mar  6 11:15:36 2008]fork1(d961f4bc,0,14,0,0) at netbsd:fork1+0x2f9
[Thu Mar  6 11:15:36 2008]sys_fork(d961f4bc,d9702c48,d9702c68,8166ce4,8166000) 
at netbsd:sys_fork+0x45
[Thu Mar  6 11:15:36 2008]syscall_plain() at netbsd:syscall_plain+0x1a8
[Thu Mar  6 11:15:36 2008]--- syscall (number 2) ---
[Thu Mar  6 11:15:36 2008]0x80f6613:
[Thu Mar  6 11:15:36 2008]db{7}> call simple_lock_dump
[Thu Mar  6 11:15:46 2008]all simple locks:
[Thu Mar  6 11:15:46 2008]0xc0a6b854 CPU 7 
/rest/work/woods/m-NetBSD-4/sys/kern/kern_lock.c:1476
[Thu Mar  6 11:15:46 2008]0xc0a54f04 CPU 7 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
[Thu Mar  6 11:15:46 2008]0x80000000
[Thu Mar  6 11:15:46 2008]db{7}> show reg
[Thu Mar  6 11:15:59 2008]ds          0x10
[Thu Mar  6 11:15:59 2008]es          0x10
[Thu Mar  6 11:15:59 2008]fs          0x30
[Thu Mar  6 11:15:59 2008]gs          0x10
[Thu Mar  6 11:15:59 2008]edi         0x1
[Thu Mar  6 11:15:59 2008]esi         0
[Thu Mar  6 11:15:59 2008]ebp         0xd97025ac
[Thu Mar  6 11:15:59 2008]ebx         0x1
[Thu Mar  6 11:15:59 2008]edx         0x10
[Thu Mar  6 11:15:59 2008]ecx         0x1
[Thu Mar  6 11:15:59 2008]eax         0
[Thu Mar  6 11:15:59 2008]eip         0xc04e84b4  cpu_Debugger+0x4
[Thu Mar  6 11:15:59 2008]cs          0x8
[Thu Mar  6 11:15:59 2008]eflags      0x202
[Thu Mar  6 11:15:59 2008]esp         0xd97025ac
[Thu Mar  6 11:15:59 2008]ss          0x10
[Thu Mar  6 11:15:59 2008]netbsd:cpu_Debugger+0x4:        popl    %ebp
[Thu Mar  6 11:15:59 2008]db{7}> reboot
[Thu Mar  6 11:16:09 2008]syncing disks... 
[Thu Mar  6 11:16:11 2008]switching with held simple_lock 0xc0a54f04 CPU 7 
/rest/work/woods/m-NetBSD-4/sys/uvm/uvm_map.c:1021
[Thu Mar  6 11:16:11 2008]copyright(0,0,c5ea1800,d961f4bc,0) at 0xc0989310
[Thu Mar  6 11:16:11 2008]Bad frame pointer: 0xc6791c40
[Thu Mar  6 11:16:11 2008]Stopped in pid 1812.2 (apcupsd) at      
netbsd:cpu_Debugger+0x4:        popl    %ebp
[Thu Mar  6 11:16:11 2008]db{7}> reboot
[Thu Mar  6 11:16:23 2008]rebooting...


        Note that my recent PR# 38019 may be the same problem, but at
        that time I was not running with LOCKDEBUG so the effect was
        very different, even though the cause seemed the same (nightly
        cron jobs, and in particular their large "find" runs).

>How-To-Repeat:

>Fix:

        unknown, but highly desired!



Home | Main Index | Thread Index | Old Index