Subject: Re: MTD devices in NetBSD
To: Garrett D'Amore <garrett_damore@tadpole.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 03/23/2006 13:05:21
--k+w/mQv8wyuph6w0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 23, 2006 at 12:38:32PM -0800, Garrett D'Amore wrote:
> Bill Studenmund wrote:
> > On Thu, Mar 23, 2006 at 10:26:31AM -0800, Garrett D'Amore wrote:
> >  =20
> >> Bill Studenmund wrote:
> > Can you read less than a block in these things?
>=20
> For NOR, absolutely.  Many NOR systems are actually mapped *directly*
> into system memory.  I presume this to be true (that you can read less
> than a sector, not the mapping bit) for NAND, but I confess I'm still
> largely ignorant of NAND.

Ok. This is important as it means that you can easily do sub-block reads.

> >> So, if the abstraction is going to use a smaller block size -- say 512
> >> bytes -- to get good allocation, we have other problems:
> >>
> >> For the rest of the discussion, lets assume a 64K sector size (the most
> >> common NOR flash size, I think):
> >>
> >> A naive implementation would make updating a sector an erase/modify
> >> cycle.  Obviously this is bad, because writing (or updating) a 64K file
> >> now requires 128 erase cycles.  Erase takes a long time, and wears down
> >> flash.  This is unworkable.
> >
> > Wait, I'm now confused. I thought we had one of three cases:
> >
> > 1) we have a flash-unaware file system sitting on a flash. This would b=
e=20
> > intended as a r/o kinda thing to help with bring-up.
> >  =20
> Yes.
> > 2) We have a flash-unaware file system on top of a wear-leveling layer =
on=20
> > the flash. This should work r/w.
> >  =20
>=20
> I'm not necessarily proposing this.  Others may be, but not me.

Understood. And I'm not saying you need to implement it. But I think it's=
=20
easy to think about now. :-)

> > 3) We have a flash-aware file system sitting on a flash.
> >
> > The case above isn't one of those three, so why do we care?
>=20
> I think we're misunderstanding each other.  Updating a sector (for any

We probably are misunderstanding each other. :-)

> r/w case) where you modify less than the whole sector at once creates
> the problem.  This happens in case #2 above.   (And also case #3 if the
> flash aware system uses a block size !=3D sector size, and wants to update
> a large file.  My understanding of strategy is that you only get one
> block at a time, not a list for the entire file.)

You get one "upper level" block at a time, for whatever that is. All of=20
the reasons for restricting the block sizes and alignments have to do with=
=20
how the block cache works (it does not cope with aliasing).

My point is that the only time that we will do writes are ones (#2 or #3)=
=20
where we have something that understands flash above us. So let's make it=
=20
help out.

So it's perfectly reasonable for the strategy routine to handle reads of=20
whatever size it wants and to require that writes are aligned and sized to=
=20
match the underlying device's block layout. Or you could also say that the=
=20
strategy routine only handles reads and that writes & erases have to go=20
through an ioctl.

> > i'm still confused. :-) 1) I don't think a file system will really use=
=20
> > 512-byte blocks internally. You'd have to specificaly set it, and I'm n=
ot=20
> > sure it'd be worth it.
> >  =20
> You need small blocks in any case (64k is too wasteful).   Even at 8K
> you still have 8 read/modify cycles.

Ok, here's a key misunderstanding of mind. That's how normal file systems=
=20
work. Why are we worrying about how normal file systems work on flash if=20
we all agree they work poorly? Either the normal file system is there=20
because you bulk-loaded it (makefs then a dd that matched block sizes on=20
flash) or it's a flash-aware writing layer that can do the caching and be=
=20
sane about things. ??

> > I think you're painting yourself into corners we don't need to be trapp=
ed=20
> > in.
>=20
> Possibly.  But exposing the details rather than hiding them seems to me
> to be *avoiding* corners.  If I hide sector and wear-leveling details
> behind some kind of meta-device or flash translation layer, then I fear
> it will limit choices that we can make otherwise.

I don't think so. The only significant hiding layer in this discussion, I=
=20
think, is the wear-leveling layer. I think it's perfectly reasonable for a=
=20
flash to attach (and have all the semantics other than wear-leveling we've=
=20
discussed) then have a wear-leveler attach to that.

> > If the flash-unaware fs is only used in r/o mode, why do we need to wor=
ry=20
> > about its write performance?
>=20
> We don't.  But the flash-aware filesystem needs to have access to
> something other than blocks/strategy, I think.

As above, I think you can get a lot o mineage out of the strategy routine.=
=20
But I partially suspect that I see it as a more flexible interface than I=
=20
gather you do. :-) Other than explicit erase, I think it can easily do=20
what you want to do in the near term.

> > The, "It's HARD to solve the problem," reason is quite reasonable at=20
> > times, and this may well be one.
>=20
> Heh.  Maybe.  I'm trying to make the problem tractable, because I need
> to implement *something*, and soon.

I guess that's also part of my frustration. I genuinely do not think I'm
making more work for you! :-) To me it seems more like, "instead of
jumping left, foreward, left, right, right, right, jump right, foreward". =
=20
:-)

If you think I'm making more work, one or both of us is misunderstanding=20
the other. :-) I freely admit it might be me. ;-)

Take care,

Bill

--k+w/mQv8wyuph6w0
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFEIw2RWz+3JHUci9cRAte0AKCRCkpleXloQssFqvcwS2djnYxrywCfT82P
UnEKUqcgnemmn7eIAJHdhJQ=
=XHK8
-----END PGP SIGNATURE-----

--k+w/mQv8wyuph6w0--