tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Is pool_cache_init limited to providing only one page worth of storage?



On Tue, Oct 22, 2024 at 11:18:49AM -0700, Brian Buhrow wrote:
> 	hello.  I'm experimenting with modifying the if_xennet_xenbus driver to enlarge the pool.
> The current code allocates one page of space for the pool cache.  When I try doubling that  to
> 2 * PAGESIZE, I get the following panic:
> 
> [  57.0902893] panic: pr_phinpage_check: [xnfrx] item 0xffff829f53f5a000 not part of pool
> 
> In looking through the sources, it looks like nothing allocates more than a page sized chunk of
> data when it calls pool_cache_init.  Yet, I don't see anything in the man page that suggests
> the limit is 1 page of space.  Is this a known limitation or is it a bug?  If it's known, where
> is it documented?

it's a muddle.

the problem you're seeing is that the object that you're allocating from
the pool_cache (and its underlying pool) is larger that the underlying
pool's back-end allocator's "page" size.  if_xennet_xenbus passes NULL
for the allocator:

                if_xennetrxbuf_cache = pool_cache_init(PAGE_SIZE, 0, 0, 0,
                    "xnfrx", NULL, IPL_NET, NULL, NULL, NULL);


so a default allocator is chosen automatically by this code in
pool_cache_bootstrap():

	if (palloc == NULL && ipl == IPL_NONE) {
		if (size > PAGE_SIZE) {
			int bigidx = pool_bigidx(size);

			palloc = &pool_allocator_big[bigidx];
			flags |= PR_NOALIGN;
		} else
			palloc = &pool_allocator_nointr;
	}


however the ipl for this pool_cache is IPL_NET rather than IPL_NONE,
so pool_cache_bootstrap() leaves palloc NULL, and thus it defaults to
this later in pool_init():

	if (palloc == NULL)
		palloc = &pool_allocator_kmem;


this allocator is defined as:

struct pool_allocator pool_allocator_kmem = {
	.pa_alloc = pool_page_alloc,
	.pa_free = pool_page_free,
	.pa_pagesz = 0
};


that pa_pagesz of 0 is converted to PAGE_SIZE the first time that
the allocator is used in pool_init():

	if (palloc->pa_refcnt++ == 0) {
		if (palloc->pa_pagesz == 0)
			palloc->pa_pagesz = PAGE_SIZE;


I guess your kernel was built with DIAGNOSTIC disabled?  there's an assertion
in pool_init() that you would have hit that would have made the problem more obvious:

	KASSERTMSG((prsize <= palloc->pa_pagesz),
	    "%s: [%s] pool item size (%zu) larger than page size (%u)",
	    __func__, wchan, prsize, palloc->pa_pagesz);


as to whether this is intentional or a bug, the manpage for pool_cache_init says:

              palloc

                    Should be typically be set to NULL, instructing
                    pool_cache_init() to select an appropriate back-end
                    allocator.  Alternate allocators can be used to partition
                    space from arbitrary sources.  Use of alternate allocators
                    is not documented here as it is not a stable, endorsed
                    part of the API.

currently you would need to provide a custom allocator to support a back-end
allocator "page" size larger than PAGE_SIZE for your non-IPL_NONE pool_cache,
so the documentation kind of says that you shouldn't do what you are trying to do.

the current rule that a non-IPL_NONE pool_cache (or any pool) will only support
object sizes up to PAGE_SIZE without a custom back-end allocator is not really
documented anywhere that I can find.

your next question will be why can't the "big" default allocators be used
for pool_caches with ipl > IPL_NONE, and I see that I was actually the one
who introduced the big allocators, for the benefit of ZFS, which allocates
objects larger than PAGE_SIZE from pool_caches via wrappers which provide
the kmem_cache* API from solaris.  but I don't recall any reason why
I made the pool_cache code use those allocators by default only for pools
with ipl == IPL_NONE.  the attached patch may allow your change to
if_xennet_xenbus to work the way you were hoping, please give it a try.

-Chuck
Index: src/sys/kern/subr_pool.c
===================================================================
RCS file: /home/chs/netbsd/cvs/src/sys/kern/subr_pool.c,v
retrieving revision 1.290
diff -u -p -r1.290 subr_pool.c
--- src/sys/kern/subr_pool.c	9 Apr 2023 12:21:59 -0000	1.290
+++ src/sys/kern/subr_pool.c	27 Oct 2024 21:31:54 -0000
@@ -830,8 +830,18 @@ pool_init(struct pool *pp, size_t size, 
 		mutex_exit(&pool_head_lock);
 #endif
 
-	if (palloc == NULL)
-		palloc = &pool_allocator_kmem;
+	if (palloc == NULL) {
+		if (size > PAGE_SIZE) {
+			int bigidx = pool_bigidx(size);
+
+			palloc = &pool_allocator_big[bigidx];
+			flags |= PR_NOALIGN;
+		} else if (ipl == IPL_NONE) {
+			palloc = &pool_allocator_nointr;
+		} else {
+			palloc = &pool_allocator_kmem;
+		}
+	}
 
 	if (!cold)
 		mutex_enter(&pool_allocator_lock);
@@ -2115,16 +2125,6 @@ pool_cache_bootstrap(pool_cache_t pc, si
 	unsigned int ppflags;
 
 	pp = &pc->pc_pool;
-	if (palloc == NULL && ipl == IPL_NONE) {
-		if (size > PAGE_SIZE) {
-			int bigidx = pool_bigidx(size);
-
-			palloc = &pool_allocator_big[bigidx];
-			flags |= PR_NOALIGN;
-		} else
-			palloc = &pool_allocator_nointr;
-	}
-
 	ppflags = flags;
 	if (ctor == NULL) {
 		ctor = NO_CTOR;


Home | Main Index | Thread Index | Old Index