Subject: Re: Different speed CPUs show up as same speed
To: Perry E. Metzger <perry@wasabisystems.com>
From: John Klos <john@sixgirls.org>
List: tech-smp
Date: 06/17/2002 01:37:05
> > Your distinction isn't very important.
>
> On the contrary, it is critical. Huge numbers of people who never
> contribute code get angry when people who do don't do something they
> ask. "Why don't you support feature Z!" they demand.

I meant that your distinction isn't particularly important within the
context of what is being discussed here. I wasn't asking for a special
support group for differing speed MP systems.

> > Unless someone can
> > illustrate a problem with running two different processors,
>
> We've given numerous statements of problems

Who has given statements of problems? All I see is speculation about
possible problems.

> 1) Locking, cache snooping, and other protocols aren't guaranteed to work
>    AT ALL between arbitrary processor steppings. If they happen to
>    work on your combination, bully for you, but the fact that Intel
>    has a compatibility matrix should tell you something.

Intel's documentation has much more to do with marketing than with
technical information; the fact that they have a compatibility matrix
tells me that they want me to buy more CPUs.

Look at it this way: a motherboard can support any of a number of
different processors. An old Nintendo-cartridge motherboard can take a
Pentium 2, a Pentium 3, or a Celeron on a slot-to-socket adapter card.
That motherboard has all the electronics for cache coherency, bus
snooping, arbitration, and so on. The CPUs are all built to work on the
same motherboards; an 800 MHz Pentium 3 on a 100 MHz bus is electrically
identical, as far as the motherboard is concerned, to a 450 MHz Pentium 2
on a 100 MHz bus. So, from the motherboard's point of view, the processors
ARE electrically the same, and therefore can work the same regardless of
stepping or speed. The software part is up to us.

> 2) Various kinds of system calibrated loops will fail. This can cause
>    serious bad mojo.

Bad programming. We don't encourage such programming in NetBSD, do we?

Another way to look at this is that if calibrated loops are necessary and
one doesn't want to check which CPU code is on, then the code should be
made to run only on the boot processor.

> 3) There can be problems with differing supported instruction sets and
>    register sets.

I guess you didn't read my last email thoroughly.

> Although you don't feel these are problems, others among us do.

I only see problems where there are problems; we're not talking about
hypothetical computers, we're talking about actual computers where we can
actually run code. I know that my kernels and none of my binaries are
compiled with SSE. I also know that most distributed binaries certainly
never use such code.

> No, it isn't supported. "Works for you" and "supported" are
> different. "Supported" doesn't mean "will boot and run". "Supported"
> means that the bulk of us working on NetBSD will care enough to help
> if it doesn't work or stops working. "Supported" also means "we
> recommend/encourage this usage".

Fine. So it's known to work, but not supported as in "Officially Approved
By The Entirety Of NetBSD". I don't see the point of making a big deal
about it. As I said before, it's not like we're making guarantees here or
anything.

> We don't encourage this bad idea. We say outright it is a bad idea. It
> isn't supported. If it "happens to work for you", well fine, but we
> aren't going to go out of our way to make sure it will work, or to
> make sure it continues to work.

Please illustrate how, aside from speculation, it's a bad idea.  Will you
also tell the people running SPARCs that what they're doing is bad? This
attitude is not a good one. Code should be correct no matter what.

An example where assumptions would be bad: what if a CPU in a dual
processor system started overheating, and was automatically throttled
down? Would we want our kernel to panic because one CPU is now a different
speed than the other? I prefer correct code.

> If, for example, you said "hey, the change you just made broke my
> differing-CPU MP configuration", I doubt anyone would care enough to
> spend time debugging it for you, which is very different from what
> would happen if you noted that a supported configuration was
> broken. If you then provided a patch that made it work for you again
> and it caused no harm to anyone else, it might be accepted -- but it
> is unlikely anyone else would bother to create such a patch.
>
> That's what "not supported" means.
>
> So, to be clear, consensus seems to be "different CPUs running in MP
> is not supported". Not "will never happen to run". "Not supported".

When you make the distinction that way, fine; again, I care little about
semantics. So it's not "supported", but it certainly does run, and there
are no technical reasons it shouldn't.

I will do my best to test these machines with NetBSD as much as I can, and
I will even look at the CPU spinup code to see if I can correct the mostly
aesthetic speed reporting bug. And if something should "break" so that
these systems no longer work, I will do my best to figure out what caused
the break and suggest alternative, or possibly "more correct", ways of
doing that thing.

Thanks,
John Klos
Sixgirls Computing Labs