tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: event counting vs. the cache



On Thu, Jan 17, 2013 at 11:10:24PM +0000, David Laight wrote:
> On Thu, Jan 17, 2013 at 03:43:13PM -0600, David Young wrote:
> > 
> > 2) Split all counters into two parts: high-order 32 bits, low-order 32
> >    bits.  It's only necessary to touch the high-order part when the
> >    low-order part rolls over, so in effect you split the counters into
> >    write-often (hot) and write-rarely (cold) parts.  Cram together the
> >    cold parts in cachelines.  Cram together the hot parts in cachelines.
> >    Only the hot parts change that often, so the ordinary footprint of
> >    counters in the cache is cut almost in half.
> 
> That means have to have special code to read them in order to avoid
> having 'silly' values.

We can end up with silly values with the status quo, too, can't we?  On
32-bit architectures like i386, x++ for uint64_t x compiles to

        addl $0x1, x
        adcl $0x0, x

If the addl carries, then reading x between the addl and adcl will show
a silly value.

I think that you can avoid the silly values.  Say you're using per-CPU
counters.  If counter x belongs to CPU p, then avoid silly values by
reading x in a low-priority thread, t, that's bound to p and reads hi(x)
then lo(x) then hi(x) again.  If hi(x) changed, then t was preempted by
a thread or an interrupt handler that wrapped lo(x), so t has to restart
the sequence.

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981


Home | Main Index | Thread Index | Old Index