Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: 4.0 system wedge, scsipi_xfer pool exhaustion?
On Wed, Sep 03, 2008 at 12:23:00PM +0200, Havard Eidnes wrote:
> Hi,
>
> a system we have in operation locally was recently upgraded from
> netbsd-3 to netbsd-4 (4.0_STABLE). After this upgrade it has now
> wedged twice, usually at times of relatively high disk I/O
> activity. The last time it wedged, I found this in the console
> log:
>
> raid1: IO failed after 5 retries.
> raid1: IO failed after 5 retries.
> sd3(ahd1:0:3:0): unable to allocate scsipi_xfer
>
> Unfortunately, it was power-cycled before I could take a closer
> look.
>
> Now, the scsipi layer appears to be of the opinion that failure
> to allocate a scsipi_xfer pool item under PR_NOWAIT is supposed
> to be a non-fatal problem. However, raidframe or the upper-layer
> user of raidframe appears to be of the opposite opinion.
This may be because raidframe failed to allocate memory too (allocate
from the buffer pool maybe?). I don't think a failure to allocate
a scsipi_xfer in the normal read path can lead to a biodone() with
b_error set. The buffer may stay in the queue forever, though, if no
free memory shows up.
>
> Any suggestions for what should be done to make this machine a
> little more stable? Pre-allocate more than a page's worth of
> scsipi_xfer items using pool_prime() in scsipi_init()?
I'm not sure this would help. But maybe we should add something like
pool_setlowat(&scsipi_xfer_pool, PAGE_SIZE / sizeof(struct scsipi_xfer) / 2).
If I understood it properly, pool_prime() doesn't prevent the page to
be recycled if the system is low on memory, and all scsipi_xfers
are free.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index