tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Please do not yell at people for trying to help you.

On 2010-11-13 04:17, Matt Thomas wrote:

On Nov 12, 2010, at 6:49 PM, Johnny Billquist wrote:

On 2010-11-13 03:26, Thor Lancelot Simon wrote:
On Sat, Nov 13, 2010 at 01:11:01AM +0100, Johnny Billquist wrote:

It's pretty clear, looking at the VAX architecture and many of the early
VAX models, that these kinds of issues were not really high on the agenda
for the original VAX architects, if they were on the agenda at all.  Surely
people at IBM and elsewhere had explored them and various publications in
the mid/late 80's and 1990s (like the Synthesis papers) generally raised
awareness of them in the Unix community; but there's a _lot_ of old hardware
out there that really gets it wrong (R2000, anyone?) and almost no modern
hardware that does.

Just read the VAX Architecture Reference Manual, to see that DEC was indeed 
thinking about performance and problems of locks in multiprocessor systems.

A typical spinlock on a VAX should be written as:

1:      BBSSI   $0,lock,2f
        BRB     3f
2:      BBS     $0,lock,2b
        BRB     1
the point being that you test and set the lock at first. If the lock is already 
held, you let the CPU spin on the lock, but just hitting the local cache, 
thereby not go hitting on a memory location that will stall other CPUs, nor 
bypass the local cache, or cause other problems.
When another CPU writes to the lock word, the cache invalidation protocol 
between the CPUs will force the spinning CPU to reload the value from main 
memory again, and if the lock is still free, it will make a new attempt to grab 
the lock.

That seems to me rather thought of, and the architecture designed for 
multiprocessor systems, and locks in them. And this was in the 80s.
Heck, actually, it was the 70s.

However, what we knew then about multiprocessor system at that time
is very naive and simple compared to what we know today.  And the
VAX interlocked instructions show that.  It's nice that the VAX
architects thought about it but really, ADAWI being 16-bits wide?

Yes, not having a 32 bit interlocked add is sad. It would have been nice if a whole bunch of more instructions were available in interlocked versions. Since that means you could have done more things as atomic operations instead of having to use locks.

And I think we do, or atleast did, at some time manage to get some 
mutiprocessor VAXen off the ground in NetBSD?

Yes we did.

Most of the examples of how to do this or that with one VAX instruction
that have been raised in this thread have turned out to not actually work.
Matt did offer a rather wonderfully perverse way to do RAS...

And I just once more demonstrated how to implement a lock on the VAX that is 
atomic. However, it is not anywhere near a CAS.
Now, wouldn't it be wonderful if the above construction could be used?

Except that it really isn't completely safe since if you get
an interrupt that wants the same lock, you have a deadlock.
And a true CAS doesn't have that problem.

Now I'm not following you.
Yes, if you grab a lock in some code, and then get an interrupt, and in that interrupt routine, you try to grab the same lock, you loose.
That should be pretty obvious.
How does using CAS to implement a lock change that???

But as I said before. Implementing a mutex on a VAX, is really simple.

Sure.  If you only care about slow uniprocessors, implementing a mutex
is simple.

Huh? The above code is MP safe, and *really* simple. Why do you assume that I'm 
only talking uniprocessor code here?

If you don't have to worry about synchronization on multiple IPLs.

If you just are referring to the fact that you can get deadlocks, then that is always an issue with IPL on your own CPU. And that is why our spin lock mutexes have an associated IPL, which we must go up to when grabbing one as well. This is not some inherent problem with the VAX locking code solution, but a problem with locks/mutexes in relationship to interrupts, no matter how you implement your locks/mutexes.

Just to be clear - I'm still not in any way saying that machines using
CAS today should stop using it. What I was asking for was just a
smoother way for machines that don't have CAS to have a way to implement
the services used in other ways that would make more sense on those
machines, while still providing the same end result.

Without CAS or an equivalent instruction, you pretty much can't do lockless
update to datastructures.  I would not expect to see use of CAS disappear
from the NetBSD kernel, since on modern architectures one really wants to
do more things without taking locks whenever possible -- not less.

Agreed. But implementing a lock can be done without CAS.

I think the problem is that a mutex is more than a simple semaphore
and the VAX interlocked instructions are just too primitive to
implement it cleanly.  At one time we had a very VAX-centric mutex
implementation but it just didn't work correctly 100% of the time.
So I choice slow correctness over fast incorrectness.  If you can
come up with a better algorithm, by all means do so.  But it has be
correct.  The mutex and rwlock definitions are port-specific so you
can change them but my experience found it wasn't worth it.

Please enlighten me to what the actual problem is that you seem to be referring to here. Since I don't think I have a clue. Combining rasing the SPL, and using bit test and set interlock instructions should be totally correct all the time. Just as much as a CAS would be (in combination with a raised SPL, or else the CAS solution will potentially deadlock on interrupts as well).

And I think that mutexes in NetBSD at the moment possibly can be implemented in a totally port-specific manner (still working on it), rwlocks can't. Unless you mean I should totally rip out kern_rwlock.c, and totally provide my own.

Given that, I can't see the benefit to removing one particular use of
atomic CAS in our kernel -- and I can see a great deal of reason _not_ to
prohibit MI code from creating more.

Removing CAS from the locking code is a place where we could see serious gain 
for architectures not having a CAS. At the same time, having a design that 
allows you to use CAS for the locking code, if that fits the bill means noone 
should loose from such a solution. There could (should) only be winners.

Eventually, most operations come down to compare and swap.  It's just too
damn useful to not have as a primitive.  Even if some of the platforms
have to emulate it somehow.  Just because an architecture is 30+ years old
doesn't mean we are forced to ignore algorithms that came after its birth.

Hmm. I don't agree. Some operations just comes down to a test and set. Like locks...

The VAX now has a fast non-MP emulation of atomic_cas so that should be
less of an issue.

I will definitely see what that might do to help things.


Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email:             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol

Home | Main Index | Thread Index | Old Index