Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: BETA6.0 - AMD Opetron 6272 (16 core) multiprocessor config crashes

Tom Ivar Helbekkmo <tih%Hamartun.Priv.NO@localhost> writes:

> Tom Ivar Helbekkmo <tih%Hamartun.Priv.NO@localhost> writes:
>> One of the things it does is be an NFS server for a couple other
>> systems.  Lately, using a -current from April 10th, it's been hanging
>> itself up from time to time, during heavy disk access from clients.
>> Everything else still works, but anything that tries to access disk
>> locks.  No error messages on the console or in the logs.
> It happened again after updating to -current as of May 1st.  I've now
> booted it with SMP disabled.  If it stays up for a few days running on
> one CPU, that should be a reasonably good indication that the problem is
> SMP related.

This box has been getting these hangs annoyingly often, and has also
been dropping into the debugger on NMI from time to time.  One or the
other, most often the hang, would happen once or twice per day.  I never
wrote down any backtraces from the NMI traps, thinking it was a flaky
RAM module (even though this was kind of strange, since the machine has
redundant (mirrored) RAM), but at least once I noticed the kernel was
doing something memory mapping-related when it happened.

Then, Martin Husemann suggested I try disabling the direct map stuff, by
editing sys/arch/amd64/include/types.h, and getting rid of these defines,
near the end of the file:

#include "opt_xen.h"
#if defined(__x86_64__) && !defined(XEN)
#define __HAVE_DIRECT_MAP 1

I wrapped the block of four defines in an #if 0 / #endif pair, and built
a new kernel.  This was five days ago, and there hasn't been a single
incident since that.

"The market" is a bunch of 28-year-olds who don't know anything. --Paul Krugman

Home | Main Index | Thread Index | Old Index