Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Severe netbsd-6 NFS server-side performance issues



On Mon, 4 Jun 2012, Hauke Fath wrote:

> At 9:46 Uhr -0700 04.06.2012, Brian Buhrow wrote:
> >From the description, it sounds like the amr(4) driver is
> >really getting wedged somewhere and this is what's causing yor problem.  The
> >question is whether the driver is the problem, the firmware on the raid
> >device or just the combination of the two.
> 
> You missed one question: Is the "amr0: bad status (not active; 0x040)"
> cause, or effect of the wedging? As I said, I get the occasional
> 
> [panic]
> dumping to dev 19,1 offset 313501
> dump 109 amr0: bad status (not active; 0x0416)
> amr0: bad status (not active; 0x0412)

Let's see...

That message comes from the ISR that walks the list of commands the 
firmware says is completed and finishes them.  However, in this case the 
commmand that the firmware claims to have completed is not actually in 
flight.

I think this means that the firmware sometimes gets confused and either 
loses or swaps command identifiers.  Your processes in the "D" state may 
be waiting for commands that the firmware thinks have completed but the 
driver does not.

One solution to this problem is for the driver to keep a timeout for each 
outstanding command, and if one of them takes too long, say more than 1 
minute to complete, force a reset and start everything over again.

Eduardo


Home | Main Index | Thread Index | Old Index