Subject: Re: Resetting ip, icmp etc statistics
To: Havard Eidnes <he@uninett.no>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-net
Date: 03/31/2006 23:24:03
--OgApRN/oydYDdnYz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Mar 31, 2006 at 10:54:34PM +0200, Havard Eidnes wrote:
>=20
> I can understand the convenience to those who don't run SNMP and
> who just fixed a problem and want to se "what does it look like
> now" or "let's leave it for a few hours to see how it does after
> that", and not having to reboot to reset the statistics.
>=20
> "Real" routers where statistics is normally collected using SNMP
> and who also have this "clear counters" function do this IMO the
> right way, i.e. by checkpointing the base level for the counters
> and only displaying the difference compared to this base level
> when commands equivalent to "netstat" are run.
>=20
> I'm not sure what I think of the "add a sysctl to prevent
> accidental resetting of the counters" idea.  As has been noted it
> would only prevent accidentall resetting, and is more of a
> stopgap because people for some reason or other balk at the
> "checkpoint base level to diff against for the user interface"
> solution.  "Bloat" might be an argument against this solution.

Well, I don't see the sysctl as a stopgap. :-)

My concern with "checkpointing", as it's described so far, is that it
really only permits you one level of checkpointing, for a lot of work. =20
We: 1) double the number of counters, 2) incriment these extra counters
(ok, this isn't so bad, and could be adjusted with some implementation
magic so that you only incriment one counter), 3) teach almost all
counter-using programs about these new counters, and 4) teach
administrators about all these new counters. And we only get one
checkpoint.

All that work for one checkpoint.

It doesn't scale well. Say one administrator wants to reset the counters=20
to measure something. That works, unless another administrator had already=
=20
reset them and was still trying to track something else. Then that=20
administrator has lost. :-(

Part of my reason for suggesting the sysctl is that I really see the
problem as either you care about monotonicity (you have SNMP daemons,
etc.) or you don't. If you don't care, you don't care. There is no need
for worrying about monotonic counters, since by definition, they don't
matter. :-)

If you do care, then yes, you really need to make it so folks can't reset=
=20
the counters. The sysctl is one way. If cooperative control isn't enough,=
=20
then when we get something like kauth's bitmask for controlling things, we=
=20
can add a bit to freeze the sysctl; if it needs locking, we'll lock it.

But if you care, you care because you've set up tools that monitor these=20
counters. If you've set up some tools, is it really that much to expect=20
you to add a few more? I could be wrong, but, given the diversity of=20
sysadmin tools I've seen, I find it hard to believe that someone hasn't=20
already come up with a tool that will let you check point a value and=20
watch it grow with time. i.e. something that will read the value, put it=20
in a file, then subtract that value in the future for reporting=20
measurements.

Given such a tool, checkpoint in userland. Let the kernel keep one=20
counter, and leave checkpointing to the data gathering tools. :-)

Thus one counter really can serve us well.=20

Take care,

Bill

--OgApRN/oydYDdnYz
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFELiqTWz+3JHUci9cRAgVOAJ9NfoOuFCqhcn7PsRaZjgayYqA7wwCfWtIv
MXvgWoaQJ0SYRXo9eOE6wYw=
=6qN3
-----END PGP SIGNATURE-----

--OgApRN/oydYDdnYz--