Re: kmem-pool-uvm

To: YAMAMOTO Takashi <yamt%mwd.biglobe.ne.jp@localhost>
Subject: Re: kmem-pool-uvm
From: Lars Heidieker <lars%heidieker.de@localhost>
Date: Wed, 18 May 2011 20:52:52 +0200
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/20/11 03:22, YAMAMOTO Takashi wrote:
> hi,
> 
> Hi,
> 
> On 04/14/11 09:05, YAMAMOTO Takashi wrote:
>>>> why do you want to make subr_kmem use uvm_km directly?
>>>> to simplify the code?
>>>> i don't want to see that change, unless there's a clear benefit.
>>>>
> The reason was to simplify the code, yes, and reduce redundancy
> because in the current implementation the vmem allocates PAGE_SIZE
> memory from the uvm_km backend for requests <= PAGE_SIZE not utilizing
> the vacache and more importantly vmem is essentially just taking the
> address allocations made by uvm_map.
> With the changes I see about 15% less kernel map entries.
>>>> let me explain some background. currently there are a number of
>>>> kernel_map related problems:
>>>>
>>>> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose.
>>>>
>>>> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced
>>>> the allocate-for-free problem. ie. to free memory, you might need to
>>>> split map-entries thus allocate some memory.
>>>>
>>>> A-3. to solve A-2, there is map-entry-reservation mechanism. it's
> complicated
>>>> and broken.
>>>>
>>>> B. kernel fault handling is complicated because it needs memory allocation
>>>> (eg. vm_anon) which needs some trick to avoid deadlock.
>>>>
>>>> C. KVA allocation is complicated because it needs memory allocation
>>>> (eg. vm_map_entry) which needs some trick to avoid deadlock.
>>>>
>>>> the most of the above can be solved by separating KVA allocation and
>>>> kernel fault handling. (except C, which will be merely moved to a
>>>> different place.)
>>>>
> A-1 with vmem_btag being slightly less then half the size of
> vm_map_entry...
> A-2 solves A1 but A-3 solves A2 with the pitfall of reintroducing a
> part of A1 as we still have less map entries in the map but we don't
> save memory as all the entries not in the map cached aside for
> potential merging.
> In this sense it seems broken to me and that it is complicated.
> Reducing the overall allocated map_entries will help here, as vacaches do.
> 
> C seems to be inevitable it's only a question where it happens...
> 
> B is a result of having pageable memory, which can fault and
> non-pageable memory in the same map, with the need to allocated
> non-pageable memory in the event of a page fault.
> 
>>>> i implemented subr_vmem so that eventually it can be used as the primary
>>>> KVA allocator. ie. when allocating from kernel_map, allocate KVA from
>>>> kernel_va_arena first and then, if and only if necessary, register it to
>>>> kernel_map for fault handling. it probably allows us to remove VACACHE
>>>> stuff, too. kmem_alloc will be backed by a vmem arena which is backed by
>>>> kernel_va_arena.
>>>>
> Originally I thought about two options with option one being what my
> patch does and two:
> 
> If vmem is made the primary kva allocator, we should carve out a
> kernel heap entirely controlled by vmem, probably one special
> vm_map_entry in the kernel_map that spans the heap or a submap that
> never has any map_entries.
> Essentially separating pageable and non-pageable memory allocations,
> this would allow for removing the vacaches in the kernel-maps as well
> as the map-entry-reservation mechanism.
> 
> Questions that follow:
> - how to size it probably.....
> 
>> is this about limiting total size for a particular allocation?
> 
> - this might be the kmem_map? or two heaps an interrupt safe one and
> one non interrupt safe?
> 
>> becuase kernel_va_arena would be quantum cache disabled,
>> most users would use another arena stacked on it.
>> (like what we currently have as kmem_arena.)
>> interrupt-safe allocations can either use kernel_va_arena directly or
>> have another arena eg. kmem_arena_intrsafe.
> 
> 
> I think having two "allocators" (vmem and the vm_map_(entries) itself)
> controlling the kernel_map isn't a good idea as both have to be in
> sync, at least every allocation that is made by vm_map_entries need to
> be made in vmem as well. There is no clear responsibility for either.
> 
>> i agree that having two allocator for KVA is bad.
>> my idea is having just one. (kernel_va_arena)
>> no allocation would be made by vm_map_entries for kernel_map.
>> kernel_map is kept merely for fault handling.
> 
>> essentially kva allocation would be:
> 
>>      va = vmem_alloc(kernel_va_arena, ...);
>>      if (pageable)
>>              create kernel_map entry for the va
>>      else
>>              ...
>>      return va;
> 
> 
> Option two is more challenging and will solve problems B and As while
> option one solves most of the As leaving B untouched.
> 
>> sure, it's more challenging and involves more work.
>> (so it hasn't finished yet. :-)
> 
>> YAMAMOTO Takashi
> 
> 
> Lars
> 

Hi,

I've made some progress in exploring both options further.
Two patches implementing either option:
a) http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch
b)
http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-vmem-uvm-extent.patch

Option a has extended kva caches for both kernel_map and kmem_map with
interfaces to it that are used by kmem(9), malloc(9) and pool(9) with
the exception that the pool_allocator_meta goes directly to the
kmem_map. (This means malloc(9) and kmem(9) use kva caches resulting in
a lower vm_map_entry count)

Option b has one vm_map_entry in the kernel_map spawning the
kernel_heap, which in turn is controlled by vmem(9).
There are the heap_arena from wich the heap_va_arena (with quantum
caches) imports as well as a internal arena for vmems meta data.
On top of the heap_va_arena are interfaces used by kmem(9), malloc(9)
and pool(9) with the pool meta data allocator going to the vmems meta arena.
Originally I had another arena on top of the heap_va_arena, with backed
the virtual memory with physical pages on import and from with
malloc(9), kmem(9) and pool(9) allocated, lets call this option c.
I replaced this arena with interface functions for efficiency reasons.

Findings after having run the system for a while and having about 1.1gig
in the pool(9)s:
Option a: about 30000 allocated kernel map_entries (not in the map but
allocated)
Option b: about 100000 allocated boundary tags.
Option c: about 400000 allocated boundary tags.

With boundary tags beeing about half the size of vm_map_entries the vmem
version uses slightly more memory but not so much.

Both versions use a modified kmem(9) that interfaces either with vmem or
the extended kva caches, which has page_aligned memory for allocations
of page_size and larger and cache_line aligned allocations for
allocations between cache_line size and page_size.
This should resolve some problems xen-kernels do have.

The vmem versions isn't quit finished the vmem_size function required by
zfs needs to be adapted etc. (And malloc(9) is just replaced by some
arena and not gathering statistics anymore...)

So far the status report.

Greetings,
Lars
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3UFYQACgkQcxuYqjT7GRby8QCfX+aS5U4PdfLcPTzsCP7LSww6
LJkAoLn+KcK+51I575vLnyX1P83gmyHi
=QwUo
-----END PGP SIGNATURE-----
Follow-Ups:
- Re: kmem-pool-uvm
  - From: Lars Heidieker
References:
- Re: kmem-pool-uvm
  - From: Lars Heidieker
- Re: kmem-pool-uvm
  - From: YAMAMOTO Takashi
Prev by Date: Re: Is it possible to do PCI enumeration in the kernel ?
Next by Date: Re: kmem-pool-uvm
Previous by Thread: Re: kmem-pool-uvm
Next by Thread: Re: kmem-pool-uvm
Indexes:
Home | Main Index | Thread Index | Old Index