Subject: RE: kern/32631: Bad concurrency checking can cause a crash in sys
To: None <christos@zoulas.com>
From: None <Yves-Emmanuel.JUTARD@fr.thalesgroup.com>
List: netbsd-bugs
Date: 01/26/2006 15:47:05
>  Can you send a diff please?

Here it is. (I'm not used to diff, so I hope I send the proper thing.)
Based on subr_pool.c v1.99.8.1

diff subr_pool.c subr_pool_corrected.c
1032a1033
>       pr_leave(pp);
1046d1046
<       pr_leave(pp);


This fix has worked for me so far, but since I'm not a kernel expert, I may be missing some things. So be careful ;-)
The "unlock" near line 1047 should not be moved because 'pool_catchup' expect it to be set.

Yves-Emmanuel.



-----Message d'origine-----
De : christos@zoulas.com [mailto:christos@zoulas.com]
Envoye : mercredi 25 janvier 2006 17:25
A : kern-bug-people@netbsd.org; gnats-admin@netbsd.org;
netbsd-bugs@netbsd.org; yves-emmanuel.jutard@fr.thalesgroup.com
Objet : Re: kern/32631: Bad concurrency checking can cause a crash in
sys/kern/subr_pool.c


The following reply was made to PR kern/32631; it has been noted by GNATS.

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/32631: Bad concurrency checking can cause a crash in sys/kern/subr_pool.c
Date: Wed, 25 Jan 2006 11:22:24 -0500

 On Jan 25,  4:05pm, yves-emmanuel.jutard@fr.thalesgroup.com (yves-emmanuel.jutard@fr.thalesgroup.com) wrote:
 -- Subject: kern/32631: Bad concurrency checking can cause a crash in sys/ker
 
 | >Number:         32631
 | >Category:       kern
 | >Synopsis:       Bad concurrency checking can cause a crash in sys/kern/subr_pool.c
 | >Confidential:   no
 | >Severity:       critical
 | >Priority:       medium
 | >Responsible:    kern-bug-people
 | >State:          open
 | >Class:          sw-bug
 | >Submitter-Id:   net
 | >Arrival-Date:   Wed Jan 25 16:05:01 +0000 2006
 | >Originator:     Yves-Emmanuel JUTARD
 | >Release:        3.0.0
 | >Organization:
 | THALES Communication
 | >Environment:
 | custom environment : recompiled from /src, only some parts of NetBSD are used (TCP/IP stack and some parts of the kernel)
 | >Description:
 | in file sys/kern/subr_pool.c,v 1.99.8.1,
 | in function 'pool_get' (l. 796)
 | line 1038, pool_get can call "pool_catchup' on a 'entered' pool (pp, locked by 'pr_enter' at line 818)
 | now, under specific conditions, pool_catchup(pp) can call pool_allocator_alloc(pp), which can call 'pool_reclaim(pp)' which call 'pr_enter(pp)', which fail and crash, since 'pp' is already entered !
 | I have experienced crashes because of that, on our custom board with limited memory.
 | >How-To-Repeat:
 | Use NetBSD on a low mem system.
 | >Fix:
 | The solution is to call 'pr_leave(pp)' just before calling 'pool_catchup(pp)' in pool_get.
 | pr_leave(pp) is normally called AFTER the call to pool_catchup, line 1046.
 | I suggest moving it BEFORE, line 1034.
 | This is valid because we have finished manipulating the pool, so we can "leave" it peacefully.
 | It works for me.
 
 Can you send a diff please?
 
 Thanks,
 
 christos