Subject: Re: MTD devices in NetBSD
To: Garrett D'Amore <>
From: Bill Studenmund <>
List: tech-kern
Date: 03/23/2006 12:09:40
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 23, 2006 at 10:26:31AM -0800, Garrett D'Amore wrote:
> Bill Studenmund wrote:
> >  =20
> > We can do this even within a block device.
> >
> > Well-chosen calls to your strategy routine will work smothly, and you h=
> > an ioctl interface for things like erase and whatever other calls you=
> > need.
> >
> > I guess a way to put it is to think of using one interface in two=20
> > different ways as opposed to an interface "below" another one.
> I've been thinking about this as well.  I think this idea implies that
> the "block" size of these things would match that native sector size.=20

Yes & no. We can look at how cd9660 handles this, as it has the same=20
issue (2k sectors !=3D 512 byte sectors).

> Mapping blocks to sectors 1:1 also means that for a lot of filesystems,
> you are going to have a lot of waste (e.g. does the filesystem allow for
> files to use less than a full device block) -- and this could be very,
> very undesirable on some systems.  (E.g. 128K minimum file size on 4MB
> flash limits you to only 32 files.  16MB only gives 128 files.)  128K
> sector sizes are rare, but 64K sector sizes are *very* common.  So you
> get 256 files in a 16MB "common" case.
> Hence, I think 1:1 block/sector mapping is a poor (even unworkable) choic=

Can you read less than a block in these things?

> So, if the abstraction is going to use a smaller block size -- say 512
> bytes -- to get good allocation, we have other problems:
> For the rest of the discussion, lets assume a 64K sector size (the most
> common NOR flash size, I think):
> A naive implementation would make updating a sector an erase/modify
> cycle.  Obviously this is bad, because writing (or updating) a 64K file
> now requires 128 erase cycles.  Erase takes a long time, and wears down
> flash.  This is unworkable.

Wait, I'm now confused. I thought we had one of three cases:

1) we have a flash-unaware file system sitting on a flash. This would be=20
intended as a r/o kinda thing to help with bring-up.

2) We have a flash-unaware file system on top of a wear-leveling layer on=
the flash. This should work r/w.

3) We have a flash-aware file system sitting on a flash.

The case above isn't one of those three, so why do we care?

> So a non-naive implementation means you have to look at the bits you are
> updating to decide whether or not an erase is necessary.  This means
> knowing the "set/clear" behavior of the bits, which isn't a problem.=20
> (The devices I've seen are all "set" on erase, and you can only clear
> individual bits.)
> But now, when I'm writing a 64K file I'm going to have to do 128 reads,
> writes.  And, if the sector is unfortunately got a single bit clear near
> the end, I've not detected this case, and I wind up having to do a
> read-modify-write even after I've done all the work to try to avoid it.

i'm still confused. :-) 1) I don't think a file system will really use=20
512-byte blocks internally. You'd have to specificaly set it, and I'm not=
sure it'd be worth it.

2) If you're writing a 64k file, you aren't going to have 512-byte writes=
coming in unless you've mis-configured dd. ;-) stdio will do 8k i/o, and=20
you'll get better performance with large block sizes in dd...

> If I operate on sectors natively, and expose that to the filesystem,
> then the filesystem can do an upfront check, erase the sector as needed,
> and *then* do the write, all at once.  (Assuming again we are writing a
> 64k file.)  Since the filesystem knows its a 64k write, it can do "the
> right thing".
> I think this means that the filesystem should *really* have a lot more
> direct control over the device, and be able to operate on sectors rather
> than blocks.  (And we've already ruled out a 1:1 sector/block mapping,
> at least if you are going to want to be able to put any other kind of
> ordinary filesystem down on these for a readonly filesystem.)
> Therefore, I'm coming to the conclusion that we need to expose *sectors*
> to a flash-aware filesystem, and the block abstraction is poor for these
> filesystems.
> Am I missing something here?

I think you're painting yourself into corners we don't need to be trapped=

If the flash-unaware fs is only used in r/o mode, why do we need to worry=
about its write performance?

The, "It's HARD to solve the problem," reason is quite reasonable at=20
times, and this may well be one.

Take care,


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.3 (NetBSD)