[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Please do not yell at people for trying to help you.
On 2010-11-13 03:26, Thor Lancelot Simon wrote:
On Sat, Nov 13, 2010 at 01:11:01AM +0100, Johnny Billquist wrote:
I just wanted a mutex and a lock interface which I could inline, and
which only had the idea that I wanted to take or release a mutex or a
lock. Some additional functions, like upgrading and downgrading rwlocks
is also perfectly fine, as well as just trying to take locks, and to be
able to query the lock/mutex for various information.
For some reason, this wish is apparently an horrible idea to most people
around here. For that I can only say that I'm sorry for you.
Among other things, that interface has to be designed so that it:
1) Is efficient and correct on multiprocessor systems with
considerably divergent memory and bus ordering constraints
and coherency models.
2) Does not needlessly smash cache lines and stall modern CPUs
waiting to access main memory.
Further, you also seem to be asking for less -- or no? -- use of atomic
CAS in the MI parts of the kernel, since VAX doesn't have it. But at that
point you're basically headed for a policy where you can't use lockless
concurrent algorithms at all -- fine on a uniprocessor like the VAX models
we support, where memory and the bus aren't so many cycles away anyhow; not
fine at all on most modern systems, whether MIPS, powerpc, ARM, x86, etc.
While I did call out that atomic_cas was used explicitly in a few parts
of the kernel, I was mostly asking if it was really needed there, with
the unspoken assumption that is probably was, my main issues the whole
time have been about locks and mutexes.
As such, the argument about lockless algorithms are hardly relevant, as
a lock is by definition not a lockless thing.
And while I agree that lockless algorithms are nice, they don't really
help much in a discussion about problems with our lock implementation.
And I'm focusing on the lock implementation, as that is something that
is executed a lot, and I seem to observe a significant slowdown in the
system, even without much user activity going on. With a system that
indicates that the CPU is mostly idle, I still can observe slowness. And
if I start doing something heavy, I observice rather a lot of system
time, as well as more interrupt time than in the past.
But as I said before, I'm open to other suggestions on where to hunt for
the lost (wasted?) resources.
It's pretty clear, looking at the VAX architecture and many of the early
VAX models, that these kinds of issues were not really high on the agenda
for the original VAX architects, if they were on the agenda at all. Surely
people at IBM and elsewhere had explored them and various publications in
the mid/late 80's and 1990s (like the Synthesis papers) generally raised
awareness of them in the Unix community; but there's a _lot_ of old hardware
out there that really gets it wrong (R2000, anyone?) and almost no modern
hardware that does.
Just read the VAX Architecture Reference Manual, to see that DEC was
indeed thinking about performance and problems of locks in
A typical spinlock on a VAX should be written as:
1: BBSSI $0,lock,2f
2: BBS $0,lock,2b
the point being that you test and set the lock at first. If the lock is
already held, you let the CPU spin on the lock, but just hitting the
local cache, thereby not go hitting on a memory location that will stall
other CPUs, nor bypass the local cache, or cause other problems.
When another CPU writes to the lock word, the cache invalidation
protocol between the CPUs will force the spinning CPU to reload the
value from main memory again, and if the lock is still free, it will
make a new attempt to grab the lock.
That seems to me rather thought of, and the architecture designed for
multiprocessor systems, and locks in them. And this was in the 80s.
Heck, actually, it was the 70s.
And I think we do, or atleast did, at some time manage to get some
mutiprocessor VAXen off the ground in NetBSD?
Most of the examples of how to do this or that with one VAX instruction
that have been raised in this thread have turned out to not actually work.
Matt did offer a rather wonderfully perverse way to do RAS...
And I just once more demonstrated how to implement a lock on the VAX
that is atomic. However, it is not anywhere near a CAS.
Now, wouldn't it be wonderful if the above construction could be used?
But as I said before. Implementing a mutex on a VAX, is really simple.
Sure. If you only care about slow uniprocessors, implementing a mutex
Huh? The above code is MP safe, and *really* simple. Why do you assume
that I'm only talking uniprocessor code here?
Just to be clear - I'm still not in any way saying that machines using
CAS today should stop using it. What I was asking for was just a
smoother way for machines that don't have CAS to have a way to implement
the services used in other ways that would make more sense on those
machines, while still providing the same end result.
Without CAS or an equivalent instruction, you pretty much can't do lockless
update to datastructures. I would not expect to see use of CAS disappear
from the NetBSD kernel, since on modern architectures one really wants to
do more things without taking locks whenever possible -- not less.
Agreed. But implementing a lock can be done without CAS.
Given that, I can't see the benefit to removing one particular use of
atomic CAS in our kernel -- and I can see a great deal of reason _not_ to
prohibit MI code from creating more.
Removing CAS from the locking code is a place where we could see serious
gain for architectures not having a CAS. At the same time, having a
design that allows you to use CAS for the locking code, if that fits the
bill means noone should loose from such a solution. There could (should)
only be winners.
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt%softjar.se@localhost || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Main Index |
Thread Index |