Subject: Re: MTD devices in NetBSD
To: Bill Studenmund <wrstuden@netbsd.org>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-kern
Date: 03/23/2006 10:26:31
Bill Studenmund wrote:
> On Thu, Mar 23, 2006 at 09:12:27AM -0800, Garrett D'Amore wrote:
>   
>> Jason Thorpe wrote:
>>     
>>> On Mar 22, 2006, at 4:17 PM, Garrett D'Amore wrote:
>>>
>>>       
>> I'm not saying we shouldn't have a block abstraction available.  Indeed,
>> I want to create one.  But what I am saying is that a filesystem might
>> do better if it can operate below that abstraction.
>>     
>
> Yes!
>
> We can do this even within a block device.
>
> Well-chosen calls to your strategy routine will work smothly, and you have 
> an ioctl interface for things like erase and whatever other calls you 
> need.
>
> I guess a way to put it is to think of using one interface in two 
> different ways as opposed to an interface "below" another one.
>   

I've been thinking about this as well.  I think this idea implies that
the "block" size of these things would match that native sector size. 

Mapping blocks to sectors 1:1 also means that for a lot of filesystems,
you are going to have a lot of waste (e.g. does the filesystem allow for
files to use less than a full device block) -- and this could be very,
very undesirable on some systems.  (E.g. 128K minimum file size on 4MB
flash limits you to only 32 files.  16MB only gives 128 files.)  128K
sector sizes are rare, but 64K sector sizes are *very* common.  So you
get 256 files in a 16MB "common" case.

Hence, I think 1:1 block/sector mapping is a poor (even unworkable) choice.

So, if the abstraction is going to use a smaller block size -- say 512
bytes -- to get good allocation, we have other problems:

For the rest of the discussion, lets assume a 64K sector size (the most
common NOR flash size, I think):

A naive implementation would make updating a sector an erase/modify
cycle.  Obviously this is bad, because writing (or updating) a 64K file
now requires 128 erase cycles.  Erase takes a long time, and wears down
flash.  This is unworkable.

So a non-naive implementation means you have to look at the bits you are
updating to decide whether or not an erase is necessary.  This means
knowing the "set/clear" behavior of the bits, which isn't a problem. 
(The devices I've seen are all "set" on erase, and you can only clear
individual bits.)

But now, when I'm writing a 64K file I'm going to have to do 128 reads,
writes.  And, if the sector is unfortunately got a single bit clear near
the end, I've not detected this case, and I wind up having to do a
read-modify-write even after I've done all the work to try to avoid it.

If I operate on sectors natively, and expose that to the filesystem,
then the filesystem can do an upfront check, erase the sector as needed,
and *then* do the write, all at once.  (Assuming again we are writing a
64k file.)  Since the filesystem knows its a 64k write, it can do "the
right thing".

I think this means that the filesystem should *really* have a lot more
direct control over the device, and be able to operate on sectors rather
than blocks.  (And we've already ruled out a 1:1 sector/block mapping,
at least if you are going to want to be able to put any other kind of
ordinary filesystem down on these for a readonly filesystem.)

Therefore, I'm coming to the conclusion that we need to expose *sectors*
to a flash-aware filesystem, and the block abstraction is poor for these
filesystems.

Am I missing something here?

    -- Garrett

> Take care,
>
> Bill
>   


-- 
Garrett D'Amore, Principal Software Engineer
Tadpole Computer / Computing Technologies Division,
General Dynamics C4 Systems
http://www.tadpolecomputer.com/
Phone: 951 325-2134  Fax: 951 325-2191