Re: 4.0 system wedge, scsipi_xfer pool exhaustion?

To: Havard Eidnes <he%NetBSD.org@localhost>
Subject: Re: 4.0 system wedge, scsipi_xfer pool exhaustion?
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Wed, 3 Sep 2008 20:01:56 +0200

On Wed, Sep 03, 2008 at 12:23:00PM +0200, Havard Eidnes wrote:
> Hi,
> 
> a system we have in operation locally was recently upgraded from
> netbsd-3 to netbsd-4 (4.0_STABLE).  After this upgrade it has now
> wedged twice, usually at times of relatively high disk I/O
> activity.  The last time it wedged, I found this in the console
> log:
> 
> raid1: IO failed after 5 retries.
> raid1: IO failed after 5 retries.
> sd3(ahd1:0:3:0): unable to allocate scsipi_xfer
> 
> Unfortunately, it was power-cycled before I could take a closer
> look.
> 
> Now, the scsipi layer appears to be of the opinion that failure
> to allocate a scsipi_xfer pool item under PR_NOWAIT is supposed
> to be a non-fatal problem.  However, raidframe or the upper-layer
> user of raidframe appears to be of the opposite opinion.

This may be because raidframe failed to allocate memory too (allocate
from the buffer pool maybe?). I don't think a failure to allocate
a scsipi_xfer in the normal read path can lead to a biodone() with
b_error set. The buffer may stay in the queue forever, though, if no
free memory shows up.

> 
> Any suggestions for what should be done to make this machine a
> little more stable?  Pre-allocate more than a page's worth of
> scsipi_xfer items using pool_prime() in scsipi_init()?

I'm not sure this would help. But maybe we should add something like
pool_setlowat(&scsipi_xfer_pool, PAGE_SIZE / sizeof(struct scsipi_xfer) / 2).
If I understood it properly, pool_prime() doesn't prevent the page to
be recycled if the system is low on memory, and all scsipi_xfers
are free.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--

References:
- 4.0 system wedge, scsipi_xfer pool exhaustion?
  - From: Havard Eidnes

Prev by Date: Re: HP ProLiant BL460c G1 boot failure
Next by Date: Re: new boot mechanisms with old kernel?
Previous by Thread: 4.0 system wedge, scsipi_xfer pool exhaustion?
Next by Thread: HP ProLiant BL460c G1 boot failure
Indexes:

Home | Main Index | Thread Index | Old Index