Re: netbsd version of compressed caching

To: "Hubert Feyrer" <hubert%feyrer.de@localhost>, tech-embed%netbsd.org@localhost
Subject: Re: netbsd version of compressed caching
From: "Selcuk AYA" <ayaselcuk%gmail.com@localhost>
Date: Fri, 12 Dec 2008 15:53:52 -0800

Here is my initial thoughts about how things should be done:

a) The whole thing will be a pseudo block device driver implemented as
an lkm. Block size of the device will be set to the page size so that
IO will be done in page aligned and page sized blocks.  Size of the
device will be the number of pages that the block device is configured
to compress. The size can be set through an ioctl command.

As an answer to Adam's question, I am not sure if this can be
generalized to compress any block device. It is possible to  implement
a pseudo block device driver which compresses sectors of block devices
and  remaps them (kind of like an lvm), but the size of the block
device would change depending on the data as compression ratio would
change and that would be a pain.

b) Multiple compression algorithms can be used. A table of algorithms
can be held and which algorithm to use can be chosen through an ioctl
command I guess.  When a page is compressed, if the compression does
not work and size gets bigger, the page will be stored as is.

c) Memory management will, naturally,  try to avoid fragmentation. I
have two possible implementations in my mind. Both of them try to take
advantage of the fact that, when a page is compressed, it can be
stored in non-contig chunks of memory.

1)There will be  a list of free chunks. Metadata for each chunk will
be kept at the head and tail of each chunk. Links for the lists and
size of the chunk will be kept at the head and whether it is free and
size will be kept at the tail(to make coalescing easier). When there
is not enough memory a page will be allocated from the kernel malloc
arena and will be added to a free list as a chunk.  When a compressed
page is to be written, enough chunks will be gotten from the free list
and put on the list for the compressed page. When chunks are gotten
from the free list, they will be broken up into two chunks  if all the
chunk is not needed. New  chunk resulting from the break up will be
put on the free list. When a compressed page is released, its chunks
will be put back on the free list. If possible, chunks will be
coalesced once freed.

The problem here is that there is not a good bound on the overall
metadata overhead. There might be many small chunks. When we are
allocating memory from the system, If we allocate multiple pages at a
time(4-8 virtually contig pages) and insert it as a chunk, then we can
make chunks cross page boundaries and decrease number of chunks, hence
the overhead of metadata.

An allocated area will be freed back to the system as soon as it is free.

2) This scheme keeps the metadata overhead predictable while still
trying to avoid fragmentation. (I got the basic idea from the paper at
http://www.cs.utexas.edu/users/oops/compressed-caching/index.html)
Virtual memory areas (4 to 8 virtually contig pages) will be allocated
and will be divided into small chunks of say 128 bytes. For each area,
a "separate" metadata memory  will be allocated and will be used to
link all free chunks in the area together using their indices in the
area. Each area will point to the first chunk that is free. Memory
from an area will be allocated using this pointer and free chunks
list.

Chunks for the compressed page will come from a single area.  Areas
upto the sizes of a page will be organized into power of two lists and
search in each list can be done with a first fit allocation. Areas
with sizes greater then page size will be put in a separate list so
that the algorithm  tries its best to use the same set of areas rather
than allocating from many areas( Remember that the allocations will be
of atmost a page size). An area will be freed back to the system as
long as it is free.

d) One thing i am worried about is size of the kernel malloc arena.
Its size will limit the size of the compressed cache we have. But to
deal with physical pages and mapping them when necessary does not
seems very attractive to me. Any comments?

thanks
Selcuk

On Thu, Dec 11, 2008 at 4:12 AM, Hubert Feyrer <hubert%feyrer.de@localhost> 
wrote:
>
> [added tech-kern]
>
> LKM/module or not shouldn't really be a question for a pseudo device, AFAIU
> the new modules framework will allow to easily compile code into the kernel
> or use as a module. I wouldn't worry too much there.
>
> About compressino algorithm, have a look at the vnd(4) driver, which has an
> option to use images that are compressed with in the "cloop" format, which
> internally uses the gzip algorithm (IIRC).
>
> I think the most interesting part will be management of "free" space, as you
> can't assume the compressed blocks to be of a fixed size - if you replace
> one (compressed) "page" with another one, the new one may not fit as it's
> too big. vnd(4) avoids this by insisting that compressed images need to be
> read-only (and compressed in userland, see vndcompress(1)).
>
>
>  - Hubert
>
>
> On Wed, 10 Dec 2008, Selcuk AYA wrote:
>>
>> Hi,
>> I found about the compressed caching project on the netbsd projects
>> page.  The page shows tech-embed mailing list as the mailing list to
>> contact. I have no NETBSD experience but I have been poking around the
>> code to get an understanding of uvm interfaces and pseudo drivers and
>> I think I can get the job done. I have also looked at the Linux
>> implementation of compressed caching. I belive there are three ways to
>> implement compressed caching in NETBSD:
>>
>> a) as a pseudo device inside the kernel. This pesuodo device would be
>> used as a swap partition. Behind the scenes, it would compress pages
>> which need to be swapped out and store them in memory.
>> b) similar to a) but this would be a pseudo driver implemented as an
>> loadable kernel module.
>>
>> For these two options, I found out that md.c (ram disk driver) in the
>> kernel serves as a good starting point to write the pseudo driver.
>>
>> c)Inside the kernel, swap pager would need to be made aware of
>> compressed caching. I imagine swap pager keeps a map to keep track of
>> where swapped out pages are on the swap device. Some of these entries
>> would point to in memory compressed pages.
>>
>> For all these options, compressed caching will have to manage its own
>> memory and keep track of variable sized chunks (I do not have a
>> detailed idea about this yet). I would prefer to go with options a) or
>> b) as I do not see a an advantage of making the swap pager aware of
>> compressed caching. Would an lkm be OK for an embedded system?
>>
>>
>> Being new to the open source world, I am worried about the licensing
>> issues. Linux compressed caching uses LZO algorithm for compression.
>> LZO is released under GPL so I guess we cannot use it in NETBSD. But
>> there are are other alternatives which do not bear the sign of GPL
>> (http://www.cs.utexas.edu/users/oops/compressed-caching/index.html) .
>> If I use a non GPL licenced compress/decompress algorithm and
>> re-implement compressed caching as above for NETBSD, do I avoid
>> licencing issues?
>>
>>
>> thanks.
>>
>

Follow-Ups:
- Re: netbsd version of compressed caching
  - From: Adam Hamsik
- Re: netbsd version of compressed caching
  - From: Thor Lancelot Simon

References:
- Re: netbsd version of compressed caching
  - From: Hubert Feyrer

Prev by Date: Re: raidframe oddity, take three (kernel panic!)
Next by Date: Re: netbsd version of compressed caching
Previous by Thread: Re: netbsd version of compressed caching
Next by Thread: Re: netbsd version of compressed caching
Indexes:

Home | Main Index | Thread Index | Old Index