Subject: Re: Some locking issues in subr_pool.c
To: Jason Thorpe <thorpej@shagadelic.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 04/30/2005 16:45:41
hi,

yea, I've always thought the locking in the pool code was a little whacky.
your changes look reasonable to me.  as for your XXXJRTs:

+       /*
+        * Destroy all caches for this pool.
+        * XXXJRT We don't lock here, because the locking order is
+        * pool_cache -> pool.  Maybe it doesn't matter because the
+        * pool is being destroyed anyway.  Or maybe we should panic
+        * here if the pool_caches are not already gone.
+        */

I'd say we should assert that there are no pool_caches referring to this pool.


+                       /*
+                        * XXXJRT Could improve this by calling
+                        * pool_do_put() and freeing the entire
+                        * page list at the end.
+                        */

yea, that would be better.


-Chuck



On Fri, Apr 29, 2005 at 11:59:17AM -0700, Jason Thorpe wrote:
> Folks...
> 
> I noticed some locking issues in subr_pool.c the other day:
> 
> - Locking protocol for pr_rmpage() was hideous.  Sometimes called  
> with the pool locked, sometimes not.  And in the "not" case, it was  
> modifying fields normally protected by the lock.
> 
> - Fixing the above requires always deferring freeing the pool page to  
> the back-end allocator, because the pool must not be locked when the  
> pool_allocator is called (lock ordering is pool_allocator -> pool;  
> see pool_allocator_free()).  This was a mostly mechanical change.
> 
> - When doing the above mechanical change, I noticed that pool_reclaim 
> () violates the pool_cache -> pool lock ordering.  I didn't see a  
> good way to avoid that problem, so pool_cache_reclaim() has to use a  
> trylock, and bail out if it that fails.  This is not a huge deal,  
> since we already trylock the pool at the beginning of pool_reclaim(),  
> and this is only an opportunistic memory reclamation anyway.
> 
> The following patch should fix these problems.  I have not even tried  
> to compile it yet, but I would like to get comments on it.
> 
> -- thorpej
>