Re: kmem-pool-uvm

To: YAMAMOTO Takashi <yamt%mwd.biglobe.ne.jp@localhost>
Subject: Re: kmem-pool-uvm
From: Lars Heidieker <lars%heidieker.de@localhost>
Date: Sat, 21 May 2011 09:14:00 +0200
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/18/11 20:52, Lars Heidieker wrote:
> On 04/20/11 03:22, YAMAMOTO Takashi wrote:
>> hi,
> 
>> Hi,
> 
>> On 04/14/11 09:05, YAMAMOTO Takashi wrote:
>>>>> why do you want to make subr_kmem use uvm_km directly?
>>>>> to simplify the code?
>>>>> i don't want to see that change, unless there's a clear benefit.
>>>>>
>> The reason was to simplify the code, yes, and reduce redundancy
>> because in the current implementation the vmem allocates PAGE_SIZE
>> memory from the uvm_km backend for requests <= PAGE_SIZE not utilizing
>> the vacache and more importantly vmem is essentially just taking the
>> address allocations made by uvm_map.
>> With the changes I see about 15% less kernel map entries.
>>>>> let me explain some background. currently there are a number of
>>>>> kernel_map related problems:
>>>>>
>>>>> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose.
>>>>>
>>>>> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced
>>>>> the allocate-for-free problem. ie. to free memory, you might need to
>>>>> split map-entries thus allocate some memory.
>>>>>
>>>>> A-3. to solve A-2, there is map-entry-reservation mechanism. it's
>> complicated
>>>>> and broken.
>>>>>
>>>>> B. kernel fault handling is complicated because it needs memory allocation
>>>>> (eg. vm_anon) which needs some trick to avoid deadlock.
>>>>>
>>>>> C. KVA allocation is complicated because it needs memory allocation
>>>>> (eg. vm_map_entry) which needs some trick to avoid deadlock.
>>>>>
>>>>> the most of the above can be solved by separating KVA allocation and
>>>>> kernel fault handling. (except C, which will be merely moved to a
>>>>> different place.)
>>>>>
>> A-1 with vmem_btag being slightly less then half the size of
>> vm_map_entry...
>> A-2 solves A1 but A-3 solves A2 with the pitfall of reintroducing a
>> part of A1 as we still have less map entries in the map but we don't
>> save memory as all the entries not in the map cached aside for
>> potential merging.
>> In this sense it seems broken to me and that it is complicated.
>> Reducing the overall allocated map_entries will help here, as vacaches do.
> 
>> C seems to be inevitable it's only a question where it happens...
> 
>> B is a result of having pageable memory, which can fault and
>> non-pageable memory in the same map, with the need to allocated
>> non-pageable memory in the event of a page fault.
> 
>>>>> i implemented subr_vmem so that eventually it can be used as the primary
>>>>> KVA allocator. ie. when allocating from kernel_map, allocate KVA from
>>>>> kernel_va_arena first and then, if and only if necessary, register it to
>>>>> kernel_map for fault handling. it probably allows us to remove VACACHE
>>>>> stuff, too. kmem_alloc will be backed by a vmem arena which is backed by
>>>>> kernel_va_arena.
>>>>>
>> Originally I thought about two options with option one being what my
>> patch does and two:
> 
>> If vmem is made the primary kva allocator, we should carve out a
>> kernel heap entirely controlled by vmem, probably one special
>> vm_map_entry in the kernel_map that spans the heap or a submap that
>> never has any map_entries.
>> Essentially separating pageable and non-pageable memory allocations,
>> this would allow for removing the vacaches in the kernel-maps as well
>> as the map-entry-reservation mechanism.
> 
>> Questions that follow:
>> - how to size it probably.....
> 
>>> is this about limiting total size for a particular allocation?
> 
>> - this might be the kmem_map? or two heaps an interrupt safe one and
>> one non interrupt safe?
> 
>>> becuase kernel_va_arena would be quantum cache disabled,
>>> most users would use another arena stacked on it.
>>> (like what we currently have as kmem_arena.)
>>> interrupt-safe allocations can either use kernel_va_arena directly or
>>> have another arena eg. kmem_arena_intrsafe.
> 
> 
>> I think having two "allocators" (vmem and the vm_map_(entries) itself)
>> controlling the kernel_map isn't a good idea as both have to be in
>> sync, at least every allocation that is made by vm_map_entries need to
>> be made in vmem as well. There is no clear responsibility for either.
> 
>>> i agree that having two allocator for KVA is bad.
>>> my idea is having just one. (kernel_va_arena)
>>> no allocation would be made by vm_map_entries for kernel_map.
>>> kernel_map is kept merely for fault handling.
> 
>>> essentially kva allocation would be:
> 
>>>     va = vmem_alloc(kernel_va_arena, ...);
>>>     if (pageable)
>>>             create kernel_map entry for the va
>>>     else
>>>             ...
>>>     return va;
> 
> 
>> Option two is more challenging and will solve problems B and As while
>> option one solves most of the As leaving B untouched.
> 
>>> sure, it's more challenging and involves more work.
>>> (so it hasn't finished yet. :-)
> 
>>> YAMAMOTO Takashi
> 
> 
>> Lars
> 
> 
> Hi,
> 
> I've made some progress in exploring both options further.
> Two patches implementing either option:
> a) http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch
> b)
> http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-vmem-uvm-extent.patch
> 
> Option a has extended kva caches for both kernel_map and kmem_map with
> interfaces to it that are used by kmem(9), malloc(9) and pool(9) with
> the exception that the pool_allocator_meta goes directly to the
> kmem_map. (This means malloc(9) and kmem(9) use kva caches resulting in
> a lower vm_map_entry count)
> 
> Option b has one vm_map_entry in the kernel_map spawning the
> kernel_heap, which in turn is controlled by vmem(9).
> There are the heap_arena from wich the heap_va_arena (with quantum
> caches) imports as well as a internal arena for vmems meta data.
> On top of the heap_va_arena are interfaces used by kmem(9), malloc(9)
> and pool(9) with the pool meta data allocator going to the vmems meta arena.
> Originally I had another arena on top of the heap_va_arena, with backed
> the virtual memory with physical pages on import and from with
> malloc(9), kmem(9) and pool(9) allocated, lets call this option c.
> I replaced this arena with interface functions for efficiency reasons.
> 
> Findings after having run the system for a while and having about 1.1gig
> in the pool(9)s:
> Option a: about 30000 allocated kernel map_entries (not in the map but
> allocated)
> Option b: about 100000 allocated boundary tags.
> Option c: about 400000 allocated boundary tags.
> 
> With boundary tags beeing about half the size of vm_map_entries the vmem
> version uses slightly more memory but not so much.
> 
> Both versions use a modified kmem(9) that interfaces either with vmem or
> the extended kva caches, which has page_aligned memory for allocations
> of page_size and larger and cache_line aligned allocations for
> allocations between cache_line size and page_size.
> This should resolve some problems xen-kernels do have.
> 
> The vmem versions isn't quit finished the vmem_size function required by
> zfs needs to be adapted etc. (And malloc(9) is just replaced by some
> arena and not gathering statistics anymore...)
> 
> So far the status report.
> 
> Greetings,
> Lars

Hi,

I suggest to use option a for the time being and once option b is ready
to replace the uvm_km* and it's kva-caches with the vmem implementation.
This will give use the benefits of fewer vm_map_entries and a kmem(9)
that does page_aligned alloctions.

Lars

- -- 
- ------------------------------------

Mystische Erklärungen:
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.

   -- Friedrich Nietzsche
   [ Die Fröhliche Wissenschaft Buch 3, 126 ]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3XZjgACgkQcxuYqjT7GRYn8wCfavqCfWGyQcxMpxVHJGRVZWZc
5MEAoKWH96l5euHeoe1NVVE7CEwKGEtV
=NqCo
-----END PGP SIGNATURE-----
Follow-Ups:
- Re: kmem-pool-uvm
  - From: YAMAMOTO Takashi
References:
- Re: kmem-pool-uvm
  - From: Lars Heidieker
- Re: kmem-pool-uvm
  - From: YAMAMOTO Takashi
- Re: kmem-pool-uvm
  - From: Lars Heidieker
Prev by Date: Re: kmem-pool-uvm
Next by Date: Re: kmem-pool-uvm
Previous by Thread: Re: kmem-pool-uvm
Next by Thread: Re: kmem-pool-uvm
Indexes:
Home | Main Index | Thread Index | Old Index