tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [Greg Troxel] Dell R610 lockups?



If these are anything like the Dell r710s, you may also wish to go into the 
BIOS and disable the "C1E" and the "C states".

I've seen those cause problems with (very) old LInux kernels and seen reports 
of them causing similar problems with (not-the-latest) OpenSolaris.

Updating the Broadcom firmware wouldn't hurt, too. (note that that's a separate 
firmware image.)


--- On Wed, 8/25/10, Christos Zoulas <christos%astron.com@localhost> wrote:

> From: Christos Zoulas <christos%astron.com@localhost>
> Subject: Re: [Greg Troxel] Dell R610 lockups?
> To: tech-net%netbsd.org@localhost
> Date: Wednesday, August 25, 2010, 5:20 AM
> In article <rmi39u3sell.fsf%fnord.ir.bbn.com@localhost>,
> Greg Troxel  <gdt%ir.bbn.com@localhost>
> wrote:
> >-=-=-=-=-=-
> >
> >It's not clear if this is an amd64 issue or a network
> code issue, so
> >pointing it out here.
> >
> >
> >-=-=-=-=-=-
> >-=-=-=-=-=-
> >
> >
> >Some colleagues at BBN have several Dell R610s,
> purchased fairly
> >recently.  They've been experiencing total hangs,
> from which they can
> >recover only with the power button (hold 4s). 
> ctrl-alt-esc works to get
> >into DDB, but after the hang ctrl-alt-esc does
> nothing.  The Dell boxes
> >are pretty normal, with single SATA disks, 4 on-board
> bnx and a 4-port
> >wm.
> >
> >The lockup happens with a netbsd-5 (RC3 I think)
> install cd for amd64
> >after doing an install and running from disk. 
> They haven't tried i386.
> >
> >It's not exactly clear what triggers the hang, but it
> seems to be
> >network traffic, with ping (sourcing and sinking) being
> worse than
> >forwarding.  A fairly reliable way to hose the
> machines is to hook up a
> >cat5 between two of them, ifconfig some addresses, and
> ping -f across
> >that.  RTT is an impressive 40us, but a lockup
> usually happens within 20
> >minutes.  Using a switch seems to make the hang
> less likely.  So I
> >wondered about a locking error triggered by tx complete
> interrupts
> >arriving in the middle of processing the next received
> packet.
> >
> >I suggested using LOCKDEBUG (and DIAGNOSTIC and
> DEBUG).  That runs ok
> >until it hangs :-) Can one enter DDB if the big kernel
> lock is taken and
> >not released?
> >
> >The machines were updated to the latest Dell BIOS;
> apparently there's a
> >dell advisory about a xeon firmware bug that results in
> windows
> >bluescreens.
> >
> >Other than the lockup the machines are acting
> fine.  So I wonder if the
> >machines are buggy, or if there's a locking bug.
> >
> >I have not seen any postings about trouble with this
> kind of lockup in
> >NetBSD, and there are some posts of trouble with Linux
> on these Dell
> >machines.
> >
> >If someone has two beefy machines with bnx or wm and
> has a few minutes
> >to connect them with a cable, I'd be very curious to
> see what happens
> >after ping -f for several hours.
> >
> >Has anyone else had similar trouble?  Any clues of
> what to try?
> >
> 
> Boot linux on both and try the same test.
> 
> christos
> 
> 





Home | Main Index | Thread Index | Old Index