Re: event counting vs. the cache

To: tech-kern%netbsd.org@localhost
Subject: Re: event counting vs. the cache
From: Simon Truss <simon%bigblue.demon.co.uk@localhost>
Date: Sat, 19 Jan 2013 09:18:59 +0000

David Young wrote:

It's customary to use a 64-bit integer to count events in NetBSD because
we don't expect for the count to roll over in the lifetime of a box
running NetBSD.

I've been thinking about what these wide integers do to the cache
footprint of a system and wondering if we shouldn't make a couple of
changes:

1) Cram just as many counters into each cacheline as possible.
   Extend/replace evcnt(9) to allow the caller to provide the storage
   for the integer.

   On a multiprocessor box, you don't want CPUs sharing counter
   cachelines if you can help it, but do cram together each individual
   CPU's counters.

2) Split all counters into two parts: high-order 32 bits, low-order 32
   bits.  It's only necessary to touch the high-order part when the
   low-order part rolls over, so in effect you split the counters into
   write-often (hot) and write-rarely (cold) parts.  Cram together the
   cold parts in cachelines.  Cram together the hot parts in cachelines.
   Only the hot parts change that often, so the ordinary footprint of
   counters in the cache is cut almost in half.


could I suggest:

3) use smaller counters guaranteed not to overflow during a definedinterval. These counters must incremented ignoring counter overflow.Abackground process gathers these keeping a copy of the last read value.By taking the unsigned difference between new count and old count andadding this to 64 bit count you can keep a 64bit counter API on thestats reader side.


notes:

a) The small counters could be kept as per cpu counters and thebackground gather operation iterates across cpus collecting aggregates.b) so long as read and writes of small counters are atomic no locking isrequired and sometimes cache coherent accesses can be relaxed.c) the stats generator (fast path) should be able to be kept clean andsimple, while the gather operation that occurs less often will have toperform the cache coherent reads.d) The gather operation could also be initiated by a read of aparticular 64 bit stats counter if up to date information is required.e) The timing of the background gather operation could be self tuning toprevent overflow of the small counters. Just keep sufficient clearbits in the high order bits of the unsigned(new - old) delta result.f) overflow of the small counters cannot be proven, in that the countdelta will probably jump between large and small values when overflowoccurs.

I suppose you could split counters into four or more parts of 16 or
fewer bits each, and in that shrink the footprint even further, but it
seems that you would reach a point of diminishing returns very quickly.

Perhaps this has been tried before and found to (not) work reasonably
well?

Dave

References:
- event counting vs. the cache
  - From: David Young

Prev by Date: Re: event counting vs. the cache
Next by Date: Re: event counting vs. the cache
Previous by Thread: Re: event counting vs. the cache
Next by Thread: Re: event counting vs. the cache
Indexes:

Home | Main Index | Thread Index | Old Index