Subject: Re: /dev on tmpfs problem
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 11/14/2005 11:08:24
--7QTDVAgpgOXqPYyV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Nov 14, 2005 at 08:34:30AM +0900, YAMAMOTO Takashi wrote:
> > > note that allocating memory for tmpfs implies
> > > reclaiming other use of memory including file cache.
> >=20
> > Indeed.  Since that has to happen anyway, why do we need another knob?
>=20
> because the desired size of tmpfs can not be calculated from
> the total amount of memory.

To me, its "all memory+swap not otherwise in use for something else,
up to the -s limit if specified".  As these things change, so does the
size of tmpfs (as seen in df).

> > Is there a reason tmpfs can't consider (clean) filecache pages above
> > filemin or filemax as "free" for it's purposes (assuming no explicit
> > -s was given, of course)? Perhaps there is a reason, like we can't
> > sleep for free pages at the relevant point, but if that's so I can't
> > see that tmpfs can work even with swap.
>=20
> filemin/max is not related.

Not in implementation now, no.  But the issue is that they're another
example of "all memory not in use for something else", and the two
uses are conflicting.

My point is simply to suggest that tmpfs recognise this other usage,
and not consider clean filecache pages used for the purpose of
freespace/size reporting.  It can then compete fairly with filecache
for those pages, within the existing bounds and rules.  Right now, it
seems it gives in too much in this competition, and won't contend with
filecache.

The vm.file* suggestions are a potential refinement: for example,
recognising that vm.filemin pages should stay allocated for that
purpose, and so tmpfs should only consider filecache pages above that
level as "free" in the above.

> > This makes a very clear concept for the user: vm.file* pages are to be
> > used for filecache, and tmpfs looks just like filecache but backed by
> > swap rather than files.  Therefore, dirty tmpfs pages should reclaim
> > clean filecache pages. If there's swap to back them, idle tmpfs pages
> > should eventually get cleaned and the space reused for more active
> > files or other needs.
>=20
> - tmpfs pages are counted as anon pages.

Yes, they are, once used and in active competition. But tmpfs needs to
know how many more it might offer the user, and such pages should be
able to be taken from file pages, especially clean ones. Perhaps this
means a further refinement to the tmpfs freespace calculation using
vm.anon* too?

> - i don't think that vm sysctl is a clear concept for users anyway. :-)

Maybe not, but I think the current tmpfs behaviour makes the situation
even worse, and adding another vm.tmpfs* set won't help either.  These
figures are in competition, and so their interaction needs to be
clear.

Having swap available just hides the issue, because those pages are
added to the tmpfs freespace potential.

> > If I use tmpfs with no backing swap, I have to recognise that I might
> > fill my filecache memory with tmpfs files that can't be paged out, and
> > thus starve myself of other cache or memory needs; I can use -s to
> > limit the size allocated to tmpfs to help here, if I want.
>=20
> you have to do so unless you have infinite amount of swap.

Not at all..  The size of tmpfs is bounded by the size of swap (and
other active allocations).  I should even be able to remove some
swapspace (if unallocated) and see the free space in /tmp drop.

-s is a useful bound when the issue is competition between tmpfs and
these other allocations.  This is the reverse situation, because
there's now competition for free space slop between file cache and
tmpfs available space reporting. =20

I'm only talking about the free space reporting, such as shown by
df. It seems this also causes tmpfs writes to return ENOSPC (which
otherwise should only happen on an actual allocation failure, or if -s
limits are hit). The assumption outlined in another mail that this is
the core of the issue here, I'd like to know if there's an actual uvm
allocation failure too.

--
Dan.
--7QTDVAgpgOXqPYyV
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (NetBSD)

iD8DBQFDd9V4EAVxvV4N66cRAs1bAJ0Q9MyMZ/ARzcsmQ628Es9oIhQiOwCgzF57
m8eYunOt7qUGL7DgYNn1DDQ=
=glvo
-----END PGP SIGNATURE-----

--7QTDVAgpgOXqPYyV--