MP locking?

I have some kernel code which was written for a pre-MP kernel; it uses
spl*() for locking.  I'd like to roll this forward to something at
least slightly more modern - specifically, a dual-CPU 4.0.1 machine.

lock(9) outlines locking facilities which I believe I can use to do
what I want - but there are other issues, such as cache coherency; do I
need to do anything special with shared data structures to ensure
coherency between processors?  Is it enough to declare them volatile,
or do I need memory barriers as well, or what?  To what extent can I
use locks from within code invoked by callouts?

