tech-kern: Re: MTD devices in NetBSD

Subject: Re: MTD devices in NetBSD
To: Eduardo Horvath <eeh@netbsd.org>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-kern
Date: 03/22/2006 09:50:03
Eduardo Horvath wrote:
> Time to ask annoying questions.
>
> On Tue, 21 Mar 2006, Garrett D'Amore wrote:
>
>   
>> Okay, I'm ready to start getting *serious* about supporting MTD (flash!)
>> devices in NetBSD.
>>
>> I've been doing a lot of research.   I want to sum it up here, along
>> with my thoughts.  I'm *particularly* interested in what core@ has to
>> say, so that perhaps we can move to actually implementing something
>> "real".  I am aware that others are starting to work on this problem,
>> but I think there is a lack of hard technical direction, and I think it
>> is important to get the "framework" right:
>>
>> MTD devices differ from regular "block" devices in some important aspects:
>>
>>    1. you have to erase a block before you write it (read-modify-write
>>       cycle)
>>    2. They need wear-leveling (writes "wear" the bits out) to prolong
>>       device life
>>    3. Many NOR devices can map directly into system memory while in read
>>       mode
>>    4. Generally NAND devices cannot do this, and usually need special
>>       handling
>>    5. You may have to to do bad block management (some block devices
>>       have to do this)
>>    6. Generally, most filesystems designed for use with block devices
>>       won't work so well
>>    7. Some devices support "execute in place", while some do not
>>    8. CompactFlash doesn't count, because it looks like an IDE disk. :-)
>>     
>
> How does the above differ from `regular "block" devices'?
>
> The erasure issue is only visible if you allow sub-block writes.  Is that 
> something you're relly thinking about doing?  Is there any reason why you 
> would want to leave a block erased without immediately writing useful data 
> to it?
>   

Not that I can think of.  But you can't just "write" to it, you want to
erase it first.  This means that there is a window of time when the
block does not contain valid data.  With an ordinary block device, this
isn't the case -- the block always contains valid data.

Robust filesystems want to understand this, and manage it properly.

Also, you want to treat erase specially, e.g. for truncation or "safe"
deletion of old data.  What you do *not* want to do is rewrite a bunch
of zeros to the device, because it creates the need for another erase
cycle later.

Additionally, block sizes are often huge.  (64 or 128K).  You don't want
to use that as the basis for your filesystem block size, I think.

And on some devices, as Allen pointed, the blocks are not necessarily
uniformly sized.  (Boot-sectored flash.)

> Wear leveling and bad block replacement have traditionally been filesystem 
> issues since they deal with the layout of the data on the device.  (There 
> once was a time when disks did not automagically do bad block replacement 
> for you, and some of the code may still be in the source tree.)  In order 
> to manage these features you will need to store data structures somewhere, 
> which essentially means implementing the equivalent of a filesystem, 
> whether you want to call it that or not.  You may want to investigate 
> using layered filesystems to implement this rather than a device 
> framework.  Layered filesystems can give more flexibility.
>   

I'm proposing letting the filesystem solve this problem, not the
framework, anyway.

> What operations do you think a flash memory device needs to support beyond 
> the standard open(), close(), strategy(), and ioctl(), or for the 
> character device, open(), close(), read(), write(), mmap(), and ioctl()?
>   

Honestly, I've not got a specific proposal for operations yet, but I
imagine that there will be additional operations for inquiring data
about the flash (possibly handled via ioctl?), changing the flash from
read/write mode, and a seperate erase function.

These functions need to operate on sectors natively.

You can emulate a lot of the above functionality on top of the
primitives, but some things may want direct access to the primitives.

    -- Garrett
> Eduardo
>   


-- 
Garrett D'Amore, Principal Software Engineer
Tadpole Computer / Computing Technologies Division,
General Dynamics C4 Systems
http://www.tadpolecomputer.com/
Phone: 951 325-2134  Fax: 951 325-2191