Subject: Re: free space (was /dev) on tmpfs problem
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 11/14/2005 20:38:55
--l/R/0nUjHCrbxDky
Content-Type: multipart/mixed; boundary="qx542RXz7QAsGbWW"
Content-Disposition: inline


--qx542RXz7QAsGbWW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Nov 14, 2005 at 11:49:26AM +0900, YAMAMOTO Takashi wrote:
> yes, my desire is different from yours.
> i understand your desire, but i dislike it.

likewise.. :-)

I don't think mine ends up looking so bad in implementation, either
(see below).

> > Put another way: with your version of -s, do we have to teach
> > the vm system not to steal pages the admin has 'promised' to tmpfs
> > with -s?  Otherwise, when the machine fills with process anons, tmpfs
> > will report free space it can't use.
>=20
> in that case, it can wait for memory ~forever.
> i don't see it as a serious problem because it just mean
> what admin specified was unreasonable.

I don't think that's different to now, and I agree with you, and
furthermore I think it's a different problem (for which -s is
intended).

Your patch, and mine, are both just different ways to calcualte free
space shown by df. In either case, and now, if that calculation shows
0, tmpfs will refuse (via ENOSPC) to try to allocate.

Having swap space available, or causing df to report (size - used)
free always, are both ways to hide this issue. Either lets tmpfs
continue on to allocation, and make the pagedaemon evict some other
pages to make room for the new tmpfs data.

It's a separate issue if tmpfs tries to allocate and has to wait a
long time or gets another kind of allocation failure.  That's a case
where it's in competition with something else for real memory (rather
than cache), and perhaps where the admin needs to do something to
better control limits of various kinds on the system (include -s, or a
better value for -s), or provision more resources. No different to
now.

> alternatively, it can return ENOSPC.  in this case,
> df won't show what you want.

Agreed.  That's not the main reason I dislike forcing -s, but it's one
of them.

> in either way, i don't see any problem.  as tmpfs is not
> a normal filesystem, what df show might not be the same as
> what you expected for normal filesystems.  so what?
> it merely means that you need to change your mind.

Yep. I have no problem with the size (and free space) of tmpfs
changing to reflect other system usage. This is a feature, IMHO.
Except that it presently reflects some usage it shouldn't, such as
caches. That accounting bug should be fixed, with the cute side-effect
that df produces meaningful numbers, but mostly to solve the original
problem.

Fixing that problem in tmpfs_subr.c::tmpfs_mem_info() is pretty
simple, see diff #1.  With this diff, my previous example with df's
and swapctl -a/-d works as I'd expect.

Furthermore, fixing that function to not look like it needs to know
uvm internals (use only uvmexp.*), do much less work and therefore
also to kill the concerns behind the XXX comments at the top, is also
pretty simple, see diff #2.

A slightly different approach would be to use uvmexp.inactive rather
than uvmexp.filepages.  I think I prefer this, actually, though I'd
like to watch it in action a little longer, it might need something
slightly different.

--
Dan.

--qx542RXz7QAsGbWW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="tmpfs.1.diff"
Content-Transfer-Encoding: quoted-printable

Index: tmpfs_subr.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/fs/tmpfs/tmpfs_subr.c,v
retrieving revision 1.14
diff -u -r1.14 tmpfs_subr.c
--- tmpfs_subr.c	11 Nov 2005 15:50:57 -0000	1.14
+++ tmpfs_subr.c	14 Nov 2005 08:51:28 -0000
@@ -951,6 +951,7 @@
 			    PAGE_SIZE;
 	}
 	size +=3D uvmexp.free;
+	size +=3D uvmexp.filepages;
=20
 	free(sep, M_TEMP);
=20

--qx542RXz7QAsGbWW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="tmpfs.2.diff"
Content-Transfer-Encoding: quoted-printable

Index: tmpfs_subr.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/fs/tmpfs/tmpfs_subr.c,v
retrieving revision 1.14
diff -u -r1.14 tmpfs_subr.c
--- tmpfs_subr.c	11 Nov 2005 15:50:57 -0000	1.14
+++ tmpfs_subr.c	14 Nov 2005 09:02:08 -0000
@@ -919,40 +919,20 @@
  * Remember to remove TMPFS_PAGES_RESERVED from the returned value to avoid
  * excessive memory usage.
  *
- * XXX: This function is used every time TMPFS_PAGES_MAX is called to gath=
er
- * the amount of free memory, something that happens during _each_
- * object allocation.  The time it takes to run this function so many
- * times is not negligible, so this value should be stored as an
- * aggregate somewhere, possibly within UVM (we cannot do it ourselves
- * because we can't get notifications on memory usage changes).
  */
 size_t
 tmpfs_mem_info(boolean_t total)
 {
-	int i, sec;
-	register_t retval;
 	size_t size;
-	struct swapent *sep;
-
-	sec =3D uvmexp.nswapdev;
-	sep =3D (struct swapent *)malloc(sizeof(struct swapent) * sec, M_TEMP,
-	    M_WAITOK);
-	KASSERT(sep !=3D NULL);
-	uvm_swap_stats(SWAP_STATS, sep, sec, &retval);
-	KASSERT(retval =3D=3D sec);
=20
 	size =3D 0;
 	if (total) {
-		for (i =3D 0; i < sec; i++)
-			size +=3D dbtob(sep[i].se_nblks) / PAGE_SIZE;
+		size +=3D uvmexp.swpages;
 	} else {
-		for (i =3D 0; i < sec; i++)
-			size +=3D dbtob(sep[i].se_nblks - sep[i].se_inuse) /
-			    PAGE_SIZE;
+		size +=3D uvmexp.swpgavail;
 	}
 	size +=3D uvmexp.free;
-
-	free(sep, M_TEMP);
+	size +=3D uvmexp.filepages;
=20
 	return size;
 }

--qx542RXz7QAsGbWW--

--l/R/0nUjHCrbxDky
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (NetBSD)

iD8DBQFDeFsvEAVxvV4N66cRAtJDAKDghyjnyRN31agwETwn32UJkS78cgCgjq89
ZEVViIiXMWYl93e4AtBr5zM=
=uiit
-----END PGP SIGNATURE-----

--l/R/0nUjHCrbxDky--