tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Is there a way to obtain a machine's cache line size?



On 21 Jan 2011, at 04:29 , Sad Clouds wrote:
> On Thu, 20 Jan 2011 11:59:03 +0800
> Dennis Ferguson <dennis.c.ferguson%gmail.com@localhost> wrote:
> 
>> Hello,
>> 
>> Is there a way to obtain the correct cache line size for the machine
>> code is running on, both in the kernel and at user level?  I see
>> there is a compile time constant CACHE_LINE_SIZE in <sys/param.h>
>> which currently seems to be always be set to 64, but I'm pretty
>> certain that is not necessarily a correct value.  For example, I'm
>> pretty sure that for the PowerPC 32 and 128 are possibilities, and
>> the same binary could run on machines with either line size so there
>> is no correct compile-time answer.
> 
> You probably won't find many processors with cache lines greater than
> 64 bytes. If you're optimising for a particular processor, read
> technical manuals to find out the size of cache lines, then simply
> define CACHE_LINE_SIZE or whatever compile time constant you're using
> to a different value.

I'm not sure about other processors but I think all 64-bit PowerPC
processors have 128 byte data cache lines;  the G5 certainly does.
Other models have 32 or 64 byte data cache lines, so the same (32-bit)
binary could be run on machines with any of those cache line sizes,
or could be running on a uniprocessor where the best thing to do (at
least when the concern is false sharing in threaded programs) is to
ignore cache lines altogether.  I don't see a good reason to optimize
a program for just one of those cases when, if you could just obtain
the right number at run time (which the operating system should
know) you could do the best thing for all of them.

The fact is, though, that the issue of false sharing is for the most
part architecture and processor independent.  Data caches on modern
(and even old) multiprocessors all work pretty much the same way, have
exactly the same issues with sharing, and have their performance improved
by exactly the same considerations.  With a single cache miss costing
many hundreds of instructions getting data which could make good use
of the cache off of cache lines which are, by program design, bound
to be frequently invalidated can easily make a significant difference.
If you can just obtain the right number at run time for the machine
you are running on it is usually fairly simple to write code which
always does the right thing.

Dennis Ferguson


Home | Main Index | Thread Index | Old Index