Subject: Re: MTD devices in NetBSD
To: Garrett D'Amore <garrett_damore@tadpole.com>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-kern
Date: 03/23/2006 10:43:50
One other point: there is non-trivial overhead in commanding a device to
enable writes to a sector.  To be performant, you really don't want to
have to do that 128 times to write a 64k block.  So a good flash
filesystem is probably going to want to only write full sectors at a
time if it can elect to do so.

    -- Garrett

Garrett D'Amore wrote:
> Bill Studenmund wrote:
>   
>> On Thu, Mar 23, 2006 at 09:12:27AM -0800, Garrett D'Amore wrote:
>>   
>>     
>>> Jason Thorpe wrote:
>>>     
>>>       
>>>> On Mar 22, 2006, at 4:17 PM, Garrett D'Amore wrote:
>>>>
>>>>       
>>>>         
>>> I'm not saying we shouldn't have a block abstraction available.  Indeed,
>>> I want to create one.  But what I am saying is that a filesystem might
>>> do better if it can operate below that abstraction.
>>>     
>>>       
>> Yes!
>>
>> We can do this even within a block device.
>>
>> Well-chosen calls to your strategy routine will work smothly, and you have 
>> an ioctl interface for things like erase and whatever other calls you 
>> need.
>>
>> I guess a way to put it is to think of using one interface in two 
>> different ways as opposed to an interface "below" another one.
>>   
>>     
>
> I've been thinking about this as well.  I think this idea implies that
> the "block" size of these things would match that native sector size. 
>
> Mapping blocks to sectors 1:1 also means that for a lot of filesystems,
> you are going to have a lot of waste (e.g. does the filesystem allow for
> files to use less than a full device block) -- and this could be very,
> very undesirable on some systems.  (E.g. 128K minimum file size on 4MB
> flash limits you to only 32 files.  16MB only gives 128 files.)  128K
> sector sizes are rare, but 64K sector sizes are *very* common.  So you
> get 256 files in a 16MB "common" case.
>
> Hence, I think 1:1 block/sector mapping is a poor (even unworkable) choice.
>
> So, if the abstraction is going to use a smaller block size -- say 512
> bytes -- to get good allocation, we have other problems:
>
> For the rest of the discussion, lets assume a 64K sector size (the most
> common NOR flash size, I think):
>
> A naive implementation would make updating a sector an erase/modify
> cycle.  Obviously this is bad, because writing (or updating) a 64K file
> now requires 128 erase cycles.  Erase takes a long time, and wears down
> flash.  This is unworkable.
>
> So a non-naive implementation means you have to look at the bits you are
> updating to decide whether or not an erase is necessary.  This means
> knowing the "set/clear" behavior of the bits, which isn't a problem. 
> (The devices I've seen are all "set" on erase, and you can only clear
> individual bits.)
>
> But now, when I'm writing a 64K file I'm going to have to do 128 reads,
> writes.  And, if the sector is unfortunately got a single bit clear near
> the end, I've not detected this case, and I wind up having to do a
> read-modify-write even after I've done all the work to try to avoid it.
>
> If I operate on sectors natively, and expose that to the filesystem,
> then the filesystem can do an upfront check, erase the sector as needed,
> and *then* do the write, all at once.  (Assuming again we are writing a
> 64k file.)  Since the filesystem knows its a 64k write, it can do "the
> right thing".
>
> I think this means that the filesystem should *really* have a lot more
> direct control over the device, and be able to operate on sectors rather
> than blocks.  (And we've already ruled out a 1:1 sector/block mapping,
> at least if you are going to want to be able to put any other kind of
> ordinary filesystem down on these for a readonly filesystem.)
>
> Therefore, I'm coming to the conclusion that we need to expose *sectors*
> to a flash-aware filesystem, and the block abstraction is poor for these
> filesystems.
>
> Am I missing something here?
>
>     -- Garrett
>
>   
>> Take care,
>>
>> Bill
>>   
>>     
>
>
>   


-- 
Garrett D'Amore, Principal Software Engineer
Tadpole Computer / Computing Technologies Division,
General Dynamics C4 Systems
http://www.tadpolecomputer.com/
Phone: 951 325-2134  Fax: 951 325-2191