tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: "adapter resource shortage"?



        hello.  I've been working over the mpt(4) driver in 5.x heavily of late 
in an
effort to make it more robust in the face of errors and the like.  My
findings are that this error message comes from the scsipi layer of the
scsi stack and the 1 second delays you're seeing are the scsipi subsystem
freezing activity to the peripheral card in question so as to give it time
to drain its transaction queues.  My guess is that you don't want to use
the hammer of breaking this congestion control mechanism in the upper
layers to fix a driver problem.  A better solution, I think, is to find out
where, exactly, the resource shortage is coming into play. The mpt(4)
driver appears to throw this error up to the scsipi layer when:

O  the IOC firmware reports that it is busy.

O  When the driver cannot allocate a request buffer for the requested
transaction.

O  When the DMA load of the final answer from the IOC firmware fails.
There are no print statements in the mpt(4) driver around any of those
events, so you may need to do a bit of work to see which one, exactly, is
throwing the error.

        I'm surprised you're seeing so many of these errors.I wonder if the scsi
bus
you're using is having termination issues or if the drives themselves are
going south.  How does the entire thing probe?  Are the drives negotiating
a synchronous transfer speed?

-Brian

On Mar 5, 12:06am, Mouse wrote:
} Subject: "adapter resource shortage"?
} I recently started trying to run a Sun Fire X4100, with amd64 5.2.
} 
} I did a large data transfer to it, to a filesystem (FFS with no
} particular options) mounted -o async.  The machine has way more than
} enough RAM to buffer everything I copied, so the copy completed
} reasonably quickly, limited mostly by the sending system's disk (the
} network was running gigabit).  But, all through it, I was seeing
} 
} sd1(mpt0:0:1:0): adapter resource shortage
} 
} appearing once a second on the console (sd1 is the drive I was copying
} to, the one mounted -o async; the OS is on sd0).  These stopped when
} the transfer finished; when I told it to sync, in preparation for
} unmounting, they started again, and, watching the disk's busy light, I
} would estimate it is busy between 1/3 and 1/2 the time with a cycle
} time of about 1Hz, which is cripplingly inefficient.
} 
} Reading the code leads me to suspect this is a perfectly normal
} resource shortage in the presence of more transfers pending than the
} hardware can handle.  However, arguing against this are (a) that
} someone felt that message worth printing and (b) that the recovery
} mechanism is a huge performance-killer, apparently locking up all
} transfers to that drive for an entire second.  (At least, that's what
} the code appears to be doing, and it matches well enough with the
} behaviour I saw.)
} 
} I'm tempted to rip out the message entirely and decrease the wait time
} drastically, probably somewhere in the 10-to-100 millisecond range, so
} that it normally wakes up before the previous transfers have completely
} drained.  But I am hesitant to do this without having some idea why
} it's set up the way it is.  (It'd be a gross kludge anyway; the right
} answer, it seems to me, would be to issue transfers as the existing
} ones finish, rather than just waiting and retrying.  But it would be an
} acceptable workaround in this case.)
} 
} So, what's the scoop with this?  Would rolling back to 4.0.1 help any?
} It is known to run at leas tminimally on that hardware, though I
} haven't stress-tested it enough to know whether it'd exhibit the same
} issue.  (A quick glance at the code leads me to suspect it would.)
} 
} /~\ The ASCII                           Mouse
} \ / Ribbon Campaign
}  X  Against HTML              mouse%rodents-montreal.org@localhost
} / \ Email!         7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Home | Main Index | Thread Index | Old Index