tech-kern: Re: tmpfs: Internal representation of data

Subject: Re: tmpfs: Internal representation of data
To: Julio M. Merino Vidal <jmmv84@gmail.com>
From: Matt Thomas <matt@3am-software.com>
List: tech-kern
Date: 07/19/2005 12:05:02
Julio M. Merino Vidal wrote:
> Hi all,
> 
> [ I'm CCing my mentors for the tmpfs project ]
> 
> The tmpfs code is up to the point where I have to start
> implementing the vnode operations.  To do this, I have
> to decide how to organize the file data in memory as
> well as all other information needed to manage it.
> 
> After thinking about this for a while, it seems that the best
> way to do this is to follow a layout similar to the one used
> for existing on-disk file systems.  That is, I need:
> - A set of nodes that describe files.  These could be like
>   regular inodes.
> - A set of blocks that store file contents.  These could store
>   directories as well (i.e., the "file" representing the directory
>   contents.).

One thing you may want to consider is that eventually that the tmpfs
maybe replace the current md-root mechanism but using an external tool
to write a tmpfs into a kernel.

> Suppressing the former (merging it with directory contents,
> thus becoming something like the FAT) means that
> implementing hard links becomes very complex, so the
> separation is worth it.

Directories are just files and should be kept in the format returned
by getdents(2).

> Now, to store these two data structures in memory.  Before
> I started coding, I thought I could use malloc/free to allocate
> and deallocate blocks of memory on the fly.  I.e., when a new
> block is needed, I do a malloc and use it.  When it is useless,
> I do a free and its associated memory is released.

You have two types of file data, those that are > page size and
those less than.  These two require different approaches.  I would
think that for complete pages, you might want to keep just physical
pages that are unmapped unless needed.  For < page size, you might
want to keep various pools to store appropriately sized items.

> This is inappropriate because malloc/free operate on wired
> memory.  Furthermore, I guess malloc is an expensive
> operation and should not be used intensively to avoid its
> overhead (think about how tmpfs could abuse it).

I would use pools, not malloc/free.

> Therefore, I started reading a bit about UVM, and although I
> don't understand it very well yet, it seems to me that it is
> possible to map a region of virtual (unwired) memory and use
> it at will (just as if you had done a malloc).  So, for now, I'm
> using malloc/free to allocate some (few) memory blocks, and
> I hope that changing this to use unwired memory will be as
> easy as doing some call to UVM to request a memory region
> and use pointers inside it for our own use.

Again, don't use malloc/free.  Use pools.

> With this in mind, it seems that a good way to map the data
> structures said before into these memory blocks could be:
> - Allocate a memory block of no. nodes * sizeof node.  This
>   could be used to hold all nodes.  It could be handled with
>   a bitmap.
> - Allocate a memory block of no. blocks * sizeof blocks.  As
>   before, with an associated bitmap describing its contents.
> (Note that the number of blocks and nodes is configured by
> the user at mount time.)

Use pools which will do this for you automagically.

> The advantages of this structure are:
> - Access to blocks and nodes is constant time, as they are
>   located in known memory positions (it's a vector).

This is wrong.  tmpfs is variable sized, not fixed.  So having
a fixed layout would be wrong.

> - Allocation of blocks and nodes is of linear time due to the
>   location of empty holes in the bitmap.  Can be solved
>   with a pool of available entries (at a later time).

If you use pools, it'll do the heavy lifting.

> - Deallocation of blocks and nodes is constant time.

I don't see that as very important.

> - It's easy to grow (and to some extent, shrink) the file
>   system at will.
> - The implementation is straightforward.
> - It is very similar to traditional Unix file systems, so it will be
>   easy to understand.

You have the advantage of being able to storing the FS using
simple native data structures without the need for indirection.
Why not take advantage of them?  The hard part is noting the
where the file data is stored since that's bound to be variably
sized.

> The code in the CVS did exactly this some revisions before
> HEAD.  In a later change, I undid the separation explained
> here and used blocks for everything.  But I'm starting to
> regret that decision because it will be very complex to store
> nodes if I don't want to waste much memory.
> 
> What do you think?  Can you come up with better ideas or
> suggestions?  Anything inherently wrong in this rationale?

I think you need to spend more time on your data structures
since that's what everything comes down to.


-- 
Matt Thomas                     email: matt@3am-software.com
3am Software Foundry              www: http://3am-software.com/bio/matt/
Cupertino, CA              disclaimer: I avow all knowledge of this message.