uvm & percpu

To: tech-kern%netbsd.org@localhost
Subject: uvm & percpu
From: Antti Kantee <pooka%cs.hut.fi@localhost>
Date: Tue, 1 Jun 2010 16:03:19 +0300

While reading the uvm page allocator code, I noticed it tries to allocate
from percpu storage before falling back to global storage.  However, even
if allocation from local storage was possible, a global stats counter is
incremented (e.g. "uvmexp.cpuhit++").  In my measurements I've observed
this type of "cheap" statcounting has a huge impact on percpu algorithms,
as you still need to load&store a globally contended memory address.
Furthermore, uvmexp cache lines are probably more contended than the page
queue, so theoretically you get less than half of the possible benefit.

I don't expect anyone to remember what the benchmark used to justify
the original percpu commit was, but if someone is going to work on it
further, I'm curious as to how much gain the percpu allocator produced
and how much more it would squeeze out if the global counter was left out.

The above example of course applies more generally.  When you're going
all out with the bag of tricks, "i++" can be very expensive ...

Follow-Ups:
- Re: uvm & percpu
  - From: Andrew Doran
- Re: uvm & percpu
  - From: Sad Clouds

Prev by Date: Re: WAPBL and IDE mac68k
Next by Date: Re: uvm & percpu
Previous by Thread: WAPBL and IDE mac68k
Next by Thread: Re: uvm & percpu
Indexes:

Home | Main Index | Thread Index | Old Index