tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kmem-pool-uvm

Hash: SHA1

On 04/20/11 03:22, YAMAMOTO Takashi wrote:
> hi,
> Hi,
> On 04/14/11 09:05, YAMAMOTO Takashi wrote:
>>>> why do you want to make subr_kmem use uvm_km directly?
>>>> to simplify the code?
>>>> i don't want to see that change, unless there's a clear benefit.
> The reason was to simplify the code, yes, and reduce redundancy
> because in the current implementation the vmem allocates PAGE_SIZE
> memory from the uvm_km backend for requests <= PAGE_SIZE not utilizing
> the vacache and more importantly vmem is essentially just taking the
> address allocations made by uvm_map.
> With the changes I see about 15% less kernel map entries.
>>>> let me explain some background. currently there are a number of
>>>> kernel_map related problems:
>>>> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose.
>>>> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced
>>>> the allocate-for-free problem. ie. to free memory, you might need to
>>>> split map-entries thus allocate some memory.
>>>> A-3. to solve A-2, there is map-entry-reservation mechanism. it's
> complicated
>>>> and broken.
>>>> B. kernel fault handling is complicated because it needs memory allocation
>>>> (eg. vm_anon) which needs some trick to avoid deadlock.
>>>> C. KVA allocation is complicated because it needs memory allocation
>>>> (eg. vm_map_entry) which needs some trick to avoid deadlock.
>>>> the most of the above can be solved by separating KVA allocation and
>>>> kernel fault handling. (except C, which will be merely moved to a
>>>> different place.)
> A-1 with vmem_btag being slightly less then half the size of
> vm_map_entry...
> A-2 solves A1 but A-3 solves A2 with the pitfall of reintroducing a
> part of A1 as we still have less map entries in the map but we don't
> save memory as all the entries not in the map cached aside for
> potential merging.
> In this sense it seems broken to me and that it is complicated.
> Reducing the overall allocated map_entries will help here, as vacaches do.
> C seems to be inevitable it's only a question where it happens...
> B is a result of having pageable memory, which can fault and
> non-pageable memory in the same map, with the need to allocated
> non-pageable memory in the event of a page fault.
>>>> i implemented subr_vmem so that eventually it can be used as the primary
>>>> KVA allocator. ie. when allocating from kernel_map, allocate KVA from
>>>> kernel_va_arena first and then, if and only if necessary, register it to
>>>> kernel_map for fault handling. it probably allows us to remove VACACHE
>>>> stuff, too. kmem_alloc will be backed by a vmem arena which is backed by
>>>> kernel_va_arena.
> Originally I thought about two options with option one being what my
> patch does and two:
> If vmem is made the primary kva allocator, we should carve out a
> kernel heap entirely controlled by vmem, probably one special
> vm_map_entry in the kernel_map that spans the heap or a submap that
> never has any map_entries.
> Essentially separating pageable and non-pageable memory allocations,
> this would allow for removing the vacaches in the kernel-maps as well
> as the map-entry-reservation mechanism.
> Questions that follow:
> - how to size it probably.....
>> is this about limiting total size for a particular allocation?
> - this might be the kmem_map? or two heaps an interrupt safe one and
> one non interrupt safe?
>> becuase kernel_va_arena would be quantum cache disabled,
>> most users would use another arena stacked on it.
>> (like what we currently have as kmem_arena.)
>> interrupt-safe allocations can either use kernel_va_arena directly or
>> have another arena eg. kmem_arena_intrsafe.
> I think having two "allocators" (vmem and the vm_map_(entries) itself)
> controlling the kernel_map isn't a good idea as both have to be in
> sync, at least every allocation that is made by vm_map_entries need to
> be made in vmem as well. There is no clear responsibility for either.
>> i agree that having two allocator for KVA is bad.
>> my idea is having just one. (kernel_va_arena)
>> no allocation would be made by vm_map_entries for kernel_map.
>> kernel_map is kept merely for fault handling.
>> essentially kva allocation would be:
>>      va = vmem_alloc(kernel_va_arena, ...);
>>      if (pageable)
>>              create kernel_map entry for the va
>>      else
>>              ...
>>      return va;
> Option two is more challenging and will solve problems B and As while
> option one solves most of the As leaving B untouched.
>> sure, it's more challenging and involves more work.
>> (so it hasn't finished yet. :-)
>> YAMAMOTO Takashi
> Lars


I've made some progress in exploring both options further.
Two patches implementing either option:

Option a has extended kva caches for both kernel_map and kmem_map with
interfaces to it that are used by kmem(9), malloc(9) and pool(9) with
the exception that the pool_allocator_meta goes directly to the
kmem_map. (This means malloc(9) and kmem(9) use kva caches resulting in
a lower vm_map_entry count)

Option b has one vm_map_entry in the kernel_map spawning the
kernel_heap, which in turn is controlled by vmem(9).
There are the heap_arena from wich the heap_va_arena (with quantum
caches) imports as well as a internal arena for vmems meta data.
On top of the heap_va_arena are interfaces used by kmem(9), malloc(9)
and pool(9) with the pool meta data allocator going to the vmems meta arena.
Originally I had another arena on top of the heap_va_arena, with backed
the virtual memory with physical pages on import and from with
malloc(9), kmem(9) and pool(9) allocated, lets call this option c.
I replaced this arena with interface functions for efficiency reasons.

Findings after having run the system for a while and having about 1.1gig
in the pool(9)s:
Option a: about 30000 allocated kernel map_entries (not in the map but
Option b: about 100000 allocated boundary tags.
Option c: about 400000 allocated boundary tags.

With boundary tags beeing about half the size of vm_map_entries the vmem
version uses slightly more memory but not so much.

Both versions use a modified kmem(9) that interfaces either with vmem or
the extended kva caches, which has page_aligned memory for allocations
of page_size and larger and cache_line aligned allocations for
allocations between cache_line size and page_size.
This should resolve some problems xen-kernels do have.

The vmem versions isn't quit finished the vmem_size function required by
zfs needs to be adapted etc. (And malloc(9) is just replaced by some
arena and not gathering statistics anymore...)

So far the status report.

Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


Home | Main Index | Thread Index | Old Index