tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Please do not yell at people for trying to help you.



On Nov 12, 2010, at 6:49 PM, Johnny Billquist wrote:

> On 2010-11-13 03:26, Thor Lancelot Simon wrote:
>> On Sat, Nov 13, 2010 at 01:11:01AM +0100, Johnny Billquist wrote:
>>> 
>>> I just wanted a mutex and a lock interface which I could inline, and
>>> which only had the idea that I wanted to take or release a mutex or a
>>> lock. Some additional functions, like upgrading and downgrading rwlocks
>>> is also perfectly fine, as well as just trying to take locks, and to be
>>> able to query the lock/mutex for various information.
>>> 
>>> For some reason, this wish is apparently an horrible idea to most people
>>> around here. For that I can only say that I'm sorry for you.
>> 
>> Among other things, that interface has to be designed so that it:
>> 
>>      1) Is efficient and correct on multiprocessor systems with
>>         considerably divergent memory and bus ordering constraints
>>         and coherency models.
>> 
>>      2) Does not needlessly smash cache lines and stall modern CPUs
>>         waiting to access main memory.
>> 
>> Further, you also seem to be asking for less -- or no? -- use of atomic
>> CAS in the MI parts of the kernel, since VAX doesn't have it.  But at that
>> point you're basically headed for a policy where you can't use lockless
>> concurrent algorithms at all -- fine on a uniprocessor like the VAX models
>> we support, where memory and the bus aren't so many cycles away anyhow; not
>> fine at all on most modern systems, whether MIPS, powerpc, ARM, x86, etc.
> 
> While I did call out that atomic_cas was used explicitly in a few parts of 
> the kernel, I was mostly asking if it was really needed there, with the 
> unspoken assumption that is probably was, my main issues the whole time have 
> been about locks and mutexes.
> As such, the argument about lockless algorithms are hardly relevant, as a 
> lock is by definition not a lockless thing.
> 
> And while I agree that lockless algorithms are nice, they don't really help 
> much in a discussion about problems with our lock implementation.
> And I'm focusing on the lock implementation, as that is something that is 
> executed a lot, and I seem to observe a significant slowdown in the system, 
> even without much user activity going on. With a system that indicates that 
> the CPU is mostly idle, I still can observe slowness. And if I start doing 
> something heavy, I observice rather a lot of system time, as well as more 
> interrupt time than in the past.
> 
> But as I said before, I'm open to other suggestions on where to hunt for the 
> lost (wasted?) resources.
> 
>> It's pretty clear, looking at the VAX architecture and many of the early
>> VAX models, that these kinds of issues were not really high on the agenda
>> for the original VAX architects, if they were on the agenda at all.  Surely
>> people at IBM and elsewhere had explored them and various publications in
>> the mid/late 80's and 1990s (like the Synthesis papers) generally raised
>> awareness of them in the Unix community; but there's a _lot_ of old hardware
>> out there that really gets it wrong (R2000, anyone?) and almost no modern
>> hardware that does.
> 
> Just read the VAX Architecture Reference Manual, to see that DEC was indeed 
> thinking about performance and problems of locks in multiprocessor systems.
> 
> A typical spinlock on a VAX should be written as:
> 
> 1:    BBSSI   $0,lock,2f
>       BRB     3f
> 2:    BBS     $0,lock,2b
>       BRB     1
> 3:
> the point being that you test and set the lock at first. If the lock is 
> already held, you let the CPU spin on the lock, but just hitting the local 
> cache, thereby not go hitting on a memory location that will stall other 
> CPUs, nor bypass the local cache, or cause other problems.
> When another CPU writes to the lock word, the cache invalidation protocol 
> between the CPUs will force the spinning CPU to reload the value from main 
> memory again, and if the lock is still free, it will make a new attempt to 
> grab the lock.
> 
> That seems to me rather thought of, and the architecture designed for 
> multiprocessor systems, and locks in them. And this was in the 80s.
> Heck, actually, it was the 70s.

However, what we knew then about multiprocessor system at that time
is very naive and simple compared to what we know today.  And the
VAX interlocked instructions show that.  It's nice that the VAX 
architects thought about it but really, ADAWI being 16-bits wide?

> And I think we do, or atleast did, at some time manage to get some 
> mutiprocessor VAXen off the ground in NetBSD?

Yes we did.

>> Most of the examples of how to do this or that with one VAX instruction
>> that have been raised in this thread have turned out to not actually work.
>> Matt did offer a rather wonderfully perverse way to do RAS...
> 
> And I just once more demonstrated how to implement a lock on the VAX that is 
> atomic. However, it is not anywhere near a CAS.
> Now, wouldn't it be wonderful if the above construction could be used?

Except that it really isn't completely safe since if you get
an interrupt that wants the same lock, you have a deadlock.
And a true CAS doesn't have that problem.

>>> But as I said before. Implementing a mutex on a VAX, is really simple.
>> 
>> Sure.  If you only care about slow uniprocessors, implementing a mutex
>> is simple.
> 
> Huh? The above code is MP safe, and *really* simple. Why do you assume that 
> I'm only talking uniprocessor code here?

If you don't have to worry about synchronization on multiple IPLs.

>>> Just to be clear - I'm still not in any way saying that machines using
>>> CAS today should stop using it. What I was asking for was just a
>>> smoother way for machines that don't have CAS to have a way to implement
>>> the services used in other ways that would make more sense on those
>>> machines, while still providing the same end result.
>> 
>> Without CAS or an equivalent instruction, you pretty much can't do lockless
>> update to datastructures.  I would not expect to see use of CAS disappear
>> from the NetBSD kernel, since on modern architectures one really wants to
>> do more things without taking locks whenever possible -- not less.
> 
> Agreed. But implementing a lock can be done without CAS.

I think the problem is that a mutex is more than a simple semaphore
and the VAX interlocked instructions are just too primitive to 
implement it cleanly.  At one time we had a very VAX-centric mutex
implementation but it just didn't work correctly 100% of the time.
So I choice slow correctness over fast incorrectness.  If you can 
come up with a better algorithm, by all means do so.  But it has be
correct.  The mutex and rwlock definitions are port-specific so you
can change them but my experience found it wasn't worth it.

>> Given that, I can't see the benefit to removing one particular use of
>> atomic CAS in our kernel -- and I can see a great deal of reason _not_ to
>> prohibit MI code from creating more.
> 
> Removing CAS from the locking code is a place where we could see serious gain 
> for architectures not having a CAS. At the same time, having a design that 
> allows you to use CAS for the locking code, if that fits the bill means noone 
> should loose from such a solution. There could (should) only be winners.

Eventually, most operations come down to compare and swap.  It's just too
damn useful to not have as a primitive.  Even if some of the platforms
have to emulate it somehow.  Just because an architecture is 30+ years old
doesn't mean we are forced to ignore algorithms that came after its birth.

The VAX now has a fast non-MP emulation of atomic_cas so that should be
less of an issue.



Home | Main Index | Thread Index | Old Index