Subject: kern/32631: Bad concurrency checking can cause a crash in sys/kern/subr_pool.c
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <yves-emmanuel.jutard@fr.thalesgroup.com>
List: netbsd-bugs
Date: 01/25/2006 16:05:02
>Number:         32631
>Category:       kern
>Synopsis:       Bad concurrency checking can cause a crash in sys/kern/subr_pool.c
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 25 16:05:01 +0000 2006
>Originator:     Yves-Emmanuel JUTARD
>Release:        3.0.0
>Organization:
THALES Communication
>Environment:
custom environment : recompiled from /src, only some parts of NetBSD are used (TCP/IP stack and some parts of the kernel)
>Description:
in file sys/kern/subr_pool.c,v 1.99.8.1,
in function 'pool_get' (l. 796)
line 1038, pool_get can call "pool_catchup' on a 'entered' pool (pp, locked by 'pr_enter' at line 818)
now, under specific conditions, pool_catchup(pp) can call pool_allocator_alloc(pp), which can call 'pool_reclaim(pp)' which call 'pr_enter(pp)', which fail and crash, since 'pp' is already entered !
I have experienced crashes because of that, on our custom board with limited memory.
>How-To-Repeat:
Use NetBSD on a low mem system.
>Fix:
The solution is to call 'pr_leave(pp)' just before calling 'pool_catchup(pp)' in pool_get.
pr_leave(pp) is normally called AFTER the call to pool_catchup, line 1046.
I suggest moving it BEFORE, line 1034.
This is valid because we have finished manipulating the pool, so we can "leave" it peacefully.
It works for me.