Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

HT bug in some Intel CPUs ?



Hi,
after fighting with a upgrade from NetBSD-3 to NetBSD-5/i386 of two
identical  servers, I came to the conclusion that hyperthreading is
broken on this CPU, causing corrupted registers or memory reads
(I couldn't determine which).
The CPU is:
cpu0: Intel (686-class), 3000.22 MHz, id 0xf4a
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 641d<SSE3,MONITOR,DS-CPL,CID,xTPR>
cpu0: features3 20100000<EM64T>
cpu0: "Intel(R) Xeon(TM) CPU 3.00GHz"
cpu0: I-cache 12K uOp cache 8-way
cpu0: L2 cache 2 MB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries

I'll resume my debug session: from symptoms I came to the conclusion that
ci_ilevel was maybe not restored properly or corrupted.
I added some checks to splraiseipl() and splx(), including in splx():
                                if ((int)x < 0 || (int)x >= NIPL) { \
                                        printf("splx(%d)\n", (int)x); \
                                        panic("splx()"); \
                                } \

This does fire quite fast after some activity (within minutes). x did have
-1 in the instance where I did print x's value (in previous attempts this
was just a KASSERT).
splx() was always called from mutex_vector_exit() via MUTEX_SPIN_SPLRESTORE().
looking at the lock value from ddb, mtxs_ipl did have the right value.
The other CPU was always in the process of aquiring a lock.
To me it looks like a hardware bug in the bus-locked operations which
cause adjacent values to appear corrupted to the other CPU, maybe for
a short time. Another possibility is register corrution between the 2
threads.

Both server are stable with a kernel using only one CPU (but HT still enabled
in BIOS).

Did someone else notice something similar, or have informations about
such bug ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index