On 2010-11-13 04:17, Matt Thomas wrote:
On Nov 12, 2010, at 6:49 PM, Johnny Billquist wrote:On 2010-11-13 03:26, Thor Lancelot Simon wrote:On Sat, Nov 13, 2010 at 01:11:01AM +0100, Johnny Billquist wrote:It's pretty clear, looking at the VAX architecture and many of the early VAX models, that these kinds of issues were not really high on the agenda for the original VAX architects, if they were on the agenda at all. Surely people at IBM and elsewhere had explored them and various publications in the mid/late 80's and 1990s (like the Synthesis papers) generally raised awareness of them in the Unix community; but there's a _lot_ of old hardware out there that really gets it wrong (R2000, anyone?) and almost no modern hardware that does.Just read the VAX Architecture Reference Manual, to see that DEC was indeed thinking about performance and problems of locks in multiprocessor systems. A typical spinlock on a VAX should be written as: 1: BBSSI $0,lock,2f BRB 3f 2: BBS $0,lock,2b BRB 1 3: the point being that you test and set the lock at first. If the lock is already held, you let the CPU spin on the lock, but just hitting the local cache, thereby not go hitting on a memory location that will stall other CPUs, nor bypass the local cache, or cause other problems. When another CPU writes to the lock word, the cache invalidation protocol between the CPUs will force the spinning CPU to reload the value from main memory again, and if the lock is still free, it will make a new attempt to grab the lock. That seems to me rather thought of, and the architecture designed for multiprocessor systems, and locks in them. And this was in the 80s. Heck, actually, it was the 70s.However, what we knew then about multiprocessor system at that time is very naive and simple compared to what we know today. And the VAX interlocked instructions show that. It's nice that the VAX architects thought about it but really, ADAWI being 16-bits wide?
Yes, not having a 32 bit interlocked add is sad. It would have been nice if a whole bunch of more instructions were available in interlocked versions. Since that means you could have done more things as atomic operations instead of having to use locks.
And I think we do, or atleast did, at some time manage to get some mutiprocessor VAXen off the ground in NetBSD?Yes we did.Most of the examples of how to do this or that with one VAX instruction that have been raised in this thread have turned out to not actually work. Matt did offer a rather wonderfully perverse way to do RAS...And I just once more demonstrated how to implement a lock on the VAX that is atomic. However, it is not anywhere near a CAS. Now, wouldn't it be wonderful if the above construction could be used?Except that it really isn't completely safe since if you get an interrupt that wants the same lock, you have a deadlock. And a true CAS doesn't have that problem.
Eh...? Now I'm not following you.Yes, if you grab a lock in some code, and then get an interrupt, and in that interrupt routine, you try to grab the same lock, you loose.
That should be pretty obvious. How does using CAS to implement a lock change that???
But as I said before. Implementing a mutex on a VAX, is really simple.Sure. If you only care about slow uniprocessors, implementing a mutex is simple.Huh? The above code is MP safe, and *really* simple. Why do you assume that I'm only talking uniprocessor code here?If you don't have to worry about synchronization on multiple IPLs.
If you just are referring to the fact that you can get deadlocks, then that is always an issue with IPL on your own CPU. And that is why our spin lock mutexes have an associated IPL, which we must go up to when grabbing one as well. This is not some inherent problem with the VAX locking code solution, but a problem with locks/mutexes in relationship to interrupts, no matter how you implement your locks/mutexes.
Just to be clear - I'm still not in any way saying that machines using CAS today should stop using it. What I was asking for was just a smoother way for machines that don't have CAS to have a way to implement the services used in other ways that would make more sense on those machines, while still providing the same end result.Without CAS or an equivalent instruction, you pretty much can't do lockless update to datastructures. I would not expect to see use of CAS disappear from the NetBSD kernel, since on modern architectures one really wants to do more things without taking locks whenever possible -- not less.Agreed. But implementing a lock can be done without CAS.I think the problem is that a mutex is more than a simple semaphore and the VAX interlocked instructions are just too primitive to implement it cleanly. At one time we had a very VAX-centric mutex implementation but it just didn't work correctly 100% of the time. So I choice slow correctness over fast incorrectness. If you can come up with a better algorithm, by all means do so. But it has be correct. The mutex and rwlock definitions are port-specific so you can change them but my experience found it wasn't worth it.
Please enlighten me to what the actual problem is that you seem to be referring to here. Since I don't think I have a clue. Combining rasing the SPL, and using bit test and set interlock instructions should be totally correct all the time. Just as much as a CAS would be (in combination with a raised SPL, or else the CAS solution will potentially deadlock on interrupts as well).
And I think that mutexes in NetBSD at the moment possibly can be implemented in a totally port-specific manner (still working on it), rwlocks can't. Unless you mean I should totally rip out kern_rwlock.c, and totally provide my own.
Given that, I can't see the benefit to removing one particular use of atomic CAS in our kernel -- and I can see a great deal of reason _not_ to prohibit MI code from creating more.Removing CAS from the locking code is a place where we could see serious gain for architectures not having a CAS. At the same time, having a design that allows you to use CAS for the locking code, if that fits the bill means noone should loose from such a solution. There could (should) only be winners.Eventually, most operations come down to compare and swap. It's just too damn useful to not have as a primitive. Even if some of the platforms have to emulate it somehow. Just because an architecture is 30+ years old doesn't mean we are forced to ignore algorithms that came after its birth.
Hmm. I don't agree. Some operations just comes down to a test and set. Like locks...
The VAX now has a fast non-MP emulation of atomic_cas so that should be less of an issue.
I will definitely see what that might do to help things. Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: bqt%softjar.se@localhost || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol