tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A hint for my Core 2 Duo MP bug



Mindaugas,

this is kern/38798, but I am unsure I've filed it under the right place.

Is there a filled PR about this? If no, can you please file it. Can you
provide more details about the problem (eg. dmesg, backtrace if crashes)?
Have you tried only amd64, or also i386?

Both. i386 and amd64 behave the same way, this makes me think the bug lies in the common x86 code, or in a file that is identical in both. But I am totally unable to fathom out what is going down there.

The problem itself is very simple. At boot, cpu0 starts correctly, cpu1 won't — I've got the message "cpu1 failed to become ready". The kernel goes further for a while, then stops before it forks init (usually, it hangs while scanning the ATAPI bus).

Deeper inside, together with Andrew we set up a basic trace in mptramp.S, more precisely in the cpu_spinup_trampoline function. None of the HALT macros is reached. The trace stays locked at 40 FF FF during all the delay loop, as if cpu_spinup_trampoline wasn't called or executed.

Yet, I've tried to poke around and deliberately introduce mistakes in the code to see if they had any consequences or if the second core was held in a permanent halt state. If I remove the passage into protected mode, nothing happens. But if I remove the .code16 preamble ahead of the first part of the function, the computer enters a boot/reboot cycle. It seems the code gets executed somehow by the second core, but way after the delay loop has expired.

Yet it is not a delay problem: I've tried to increase the delay loop by one or two orders of magnitude, to no avail. Something else strange: for a while, I could get both core started this way: compiling the kernel with the MPDEBUG option, it would drop in ddd after the "cpu1 failed to become ready" message ; then simply typing "cont" would start cpu1 and resume normal kernel boot with both cores enabled. This "workaround" ceased to work at some point.

On the whole, it seems the second core is waiting for the first doing something, something that doesn't not happen or happens too late or out of sync. NetBSD 4 boots both cores correctly, so do FreeBSD and Linux. Andrew suspected a BIOS disorder, but since all the other OS work correctly, I am suspecting something has changed in MP boot that affects especially this machine (but why?).

There it is, thanks for your interest.

Vincent


Home | Main Index | Thread Index | Old Index