Subject: Re: MTD devices in NetBSD
To: Garrett D'Amore <garrett_damore@tadpole.com>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-kern
Date: 03/23/2006 10:43:50
One other point: there is non-trivial overhead in commanding a device to
enable writes to a sector. To be performant, you really don't want to
have to do that 128 times to write a 64k block. So a good flash
filesystem is probably going to want to only write full sectors at a
time if it can elect to do so.
-- Garrett
Garrett D'Amore wrote:
> Bill Studenmund wrote:
>
>> On Thu, Mar 23, 2006 at 09:12:27AM -0800, Garrett D'Amore wrote:
>>
>>
>>> Jason Thorpe wrote:
>>>
>>>
>>>> On Mar 22, 2006, at 4:17 PM, Garrett D'Amore wrote:
>>>>
>>>>
>>>>
>>> I'm not saying we shouldn't have a block abstraction available. Indeed,
>>> I want to create one. But what I am saying is that a filesystem might
>>> do better if it can operate below that abstraction.
>>>
>>>
>> Yes!
>>
>> We can do this even within a block device.
>>
>> Well-chosen calls to your strategy routine will work smothly, and you have
>> an ioctl interface for things like erase and whatever other calls you
>> need.
>>
>> I guess a way to put it is to think of using one interface in two
>> different ways as opposed to an interface "below" another one.
>>
>>
>
> I've been thinking about this as well. I think this idea implies that
> the "block" size of these things would match that native sector size.
>
> Mapping blocks to sectors 1:1 also means that for a lot of filesystems,
> you are going to have a lot of waste (e.g. does the filesystem allow for
> files to use less than a full device block) -- and this could be very,
> very undesirable on some systems. (E.g. 128K minimum file size on 4MB
> flash limits you to only 32 files. 16MB only gives 128 files.) 128K
> sector sizes are rare, but 64K sector sizes are *very* common. So you
> get 256 files in a 16MB "common" case.
>
> Hence, I think 1:1 block/sector mapping is a poor (even unworkable) choice.
>
> So, if the abstraction is going to use a smaller block size -- say 512
> bytes -- to get good allocation, we have other problems:
>
> For the rest of the discussion, lets assume a 64K sector size (the most
> common NOR flash size, I think):
>
> A naive implementation would make updating a sector an erase/modify
> cycle. Obviously this is bad, because writing (or updating) a 64K file
> now requires 128 erase cycles. Erase takes a long time, and wears down
> flash. This is unworkable.
>
> So a non-naive implementation means you have to look at the bits you are
> updating to decide whether or not an erase is necessary. This means
> knowing the "set/clear" behavior of the bits, which isn't a problem.
> (The devices I've seen are all "set" on erase, and you can only clear
> individual bits.)
>
> But now, when I'm writing a 64K file I'm going to have to do 128 reads,
> writes. And, if the sector is unfortunately got a single bit clear near
> the end, I've not detected this case, and I wind up having to do a
> read-modify-write even after I've done all the work to try to avoid it.
>
> If I operate on sectors natively, and expose that to the filesystem,
> then the filesystem can do an upfront check, erase the sector as needed,
> and *then* do the write, all at once. (Assuming again we are writing a
> 64k file.) Since the filesystem knows its a 64k write, it can do "the
> right thing".
>
> I think this means that the filesystem should *really* have a lot more
> direct control over the device, and be able to operate on sectors rather
> than blocks. (And we've already ruled out a 1:1 sector/block mapping,
> at least if you are going to want to be able to put any other kind of
> ordinary filesystem down on these for a readonly filesystem.)
>
> Therefore, I'm coming to the conclusion that we need to expose *sectors*
> to a flash-aware filesystem, and the block abstraction is poor for these
> filesystems.
>
> Am I missing something here?
>
> -- Garrett
>
>
>> Take care,
>>
>> Bill
>>
>>
>
>
>
--
Garrett D'Amore, Principal Software Engineer
Tadpole Computer / Computing Technologies Division,
General Dynamics C4 Systems
http://www.tadpolecomputer.com/
Phone: 951 325-2134 Fax: 951 325-2191