tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kmem-pool-uvm

Lars Heidieker <> wrote:
> The patch provided includes the changed extent as well as a changed
> implementation of kmem, which provides page aligned memory for
> allocation >= PAGE_SIZE. The interface between the pool-subsystem and
> uvm_km is changed by passing the pool_allocators page_size to the
> uvm_km functions, the idea behind this is to have multiply default
> pool_allocators with different pool-page-sizes to lower inner
> fragmentation within the pools.
> In order to support those different-pool-page-sizes the kernel_map and
> kmem_map gained caches for virtual addresses not only for PAGE_SIZE
> but low integer multiplies of PAGE_SIZE.
> These large then PAGE_SIZE caches are used by the the larger the
> PAGE_SIZE allocations in kmem.


- What are the effects of replacing vmem(9) to fragmentation?  Do you
  have some numbers that kmem_cache_big_sizes is doing a good job?

- The patch decreases the variety of small-sized caches (e.g. no more
  8, 24, 40, 56, etc).  On what basis?  Was it measured?  It might be
  sensitive!  See here:

- Can you try regress/sys/kern/allocfree/allocfree.c with your patch?

If yamt@ says OK for replacing the use of vmem(9), as he probably has a
best understanding of fragmentation issues, then you have a green light
for the patch! :)

Also, I suppose by mistake, patch has removed kmem_poison_check() and the
following checks:

-       KASSERT(!cpu_intr_p());
-       KASSERT(!cpu_softintr_p());

These are must-have, critical mechanisms for catching bugs!

uvm_km_alloc_poolpage_cache() and uvm_km_alloc_poolpage() now take size
argument.  Please KASSERT() that size is dividable by PAGE_SIZE.  Also,
in uvm_km_free_poolpage_cache(), the following was removed:

-#if defined(DEBUG)
-       pmap_update(pmap_kernel());

Since interaction between layers here is already a bit confusing, this
definitely needs a comment i.e. why leaving a stale entry is OK.  Which
is because of KVA reservation (as it is KVA cache), where TLB flush would
be performed in km_vacache_free().  (A case of KVA starvation, normally?)


Few other comments on the patch.

Please use diff -p option when generating the patches.  Also, splitting
mechanical parts and functional into a separate diffs would ease review,
and e.g. malloc to kmem conversions can/should be committed separately.

-       hd = malloc(sizeof(*hd), M_DEVBUF, M_NOWAIT);
+       hd = kmem_alloc(sizeof(struct hook_desc), KM_NOSLEEP);

All such M_NOWAIT cases should be audited that there are no uses from
interrupt context, and if not - changed to KM_SLEEP (check for NULL can
be removed too).  Generally, there should be no KM_NOSLEEP uses without
a very good reason.

-       t = malloc(len + sizeof(struct m_tag), M_PACKET_TAGS, wait);
+       t = kmem_alloc(len + sizeof(struct m_tag), 
+               wait ? KM_SLEEP : KM_NOSLEEP);

mbuf tags are created from interrupt context, so this is a bug (those
removed KASSERT()s would have caught)!

-                       free(corename, M_TEMP);
+                       kmem_free(corename, strlen(corename) + 1);

Hmm, these are a bit inefficient.  I have been thinking about this..
If, for nearly all cases, size is either constant or can-be/is normally
saved in some structures, and only strings are real, justified cases to
store the size within allocation, e.g. in a byte or word before the
actual string - perhaps add kmem_stralloc() and kmem_strfree()?

+       size += REDZONE_SIZE + SIZE_SIZE;
+        if ((index = ((size - 1) >> KMEM_TINY_SHIFT)) < (KMEM_TINY_MAXSIZE >> 
+                pc = kmem_cache_tiny[index];
+        } else if ((index = ((size - 1) >> KMEM_BIG_SHIFT)) < 
+                pc = kmem_cache_big[index];
+        } else {
+                uvm_km_free(kernel_map, (vaddr_t)p, round_page(size), 
+               return;
+        }

Please KNF, clean-up (the whole patch).  Lines should be no longer than 80
characters.  When wrapping long lines, second level indents have four extra
spaces (not a tab).  There are many cases in the patch with the trailing,
added or missed whitespaces, etc.

- * to only a part of an amap).  if the malloc of the array fails
+ * to only a part of an amap).  if the kmem of the array fails

In such places use word "allocation", no need to be allocator-specific. :)

        rw_enter(&swap_syscall_lock, RW_WRITER);
-       userpath = malloc(SWAP_PATH_MAX, M_TEMP, M_WAITOK);
+       userpath = kmem_alloc(SWAP_PATH_MAX, KM_SLEEP);
-       free(userpath, M_TEMP);
+       kmem_free(userpath, SWAP_PATH_MAX);

Opportunity to move out kmem_*() outside the lock, as it is safe!

-               sdp = malloc(sizeof *sdp, M_VMSWAP, M_WAITOK);
-               spp = malloc(sizeof *spp, M_VMSWAP, M_WAITOK);
+               sdp = kmem_alloc(sizeof *sdp, KM_SLEEP);
+               spp = kmem_alloc(sizeof *spp, KM_SLEEP);
                memset(sdp, 0, sizeof(*sdp));

KNF: sizeof *sdp -> sizeof(*sdp).  Also, -memset(), +kmem_zalloc().


Home | Main Index | Thread Index | Old Index