Re: "adapter resource shortage"?

To: Mouse <mouse%Rodents-Montreal.ORG@localhost>, tech-kern%NetBSD.org@localhost
Subject: Re: "adapter resource shortage"?
From: Brian Buhrow <buhrow%nfbcal.org@localhost>
Date: Tue, 5 Mar 2013 00:31:40 -0800

        hello.  I think the problem is related to a mismatch between the way
the driver configures the controller and the drives and the scsipi layer's
knowledge of that configuration.  Your dmesg output shows the drives
running in asynchronous transfer mode, which I think means the scsipi layer
waits for acknowledgement from the drives before issuing followup commands
to them.  I see the bug reports listed in Hauke's mail and I think they
amount to a lack of programming of the hardware at optimal settings.  Since
we use the scsi versions of these cards heavily, and also chafe at the lack
of performance, I'll look to see if I can glean any insights from the
FreeBSD, OpenBSD or Linux drivers.  Those sources have been helpful for
getting ideas on how to improve error handling in general and improve on
what they did.  I believe I'm close to a version that's pretty error proof,
at least when it comes to errors thrown by the mpt/LSI firmware itself.
Not quite commit ready, but I'm happy to send you patches against the 5.x
tree if you're interested.  Now that I see these bugs, I'll look further at
improving normal operations, which  I didn't realize could be improved.
-thanks
-Brian

On Mar 5,  2:16am, Mouse wrote:
} Subject: Re: "adapter resource shortage"?
} > I've been working over the mpt(4) driver in 5.x heavily of late in an
} > effort to make it more robust in the face of errors and the like.  My
} > findings are that this error message comes from the scsipi layer of
} > the scsi stack and the 1 second delays you're seeing are the scsipi
} > subsystem freezing activity to the peripheral card in question so as
} > to give it time to drain its transaction queues.
} 
} That's approximately what I thought.
} 
} > My guess is that you don't want to use the hammer of breaking this
} > congestion control mechanism in the upper layers to fix a driver
} > problem.
} 
} I'd rather not, but it's got one major advantage, that being that I'm
} confident I can implement it.
} 
} I certainly could add printfs to find out where the
} XS_RESOURCE_SHORTAGE is coming from.
} 
} > [...]
} > I'm surprised you're seeing so many of these errors.  I wonder if the
} > scsi bus you're using is having termination issues or if the drives
} > themselves are going south.
} 
} Strikes me as unlikely, especially the former; the bus in question does
} not actually exist - they're SAS, not real SCSI bus.
} 
} > How does the entire thing probe?  Are the drives negotiating a
} > synchronous transfer speed?
} 
} Cut down to just the branch of the tree leading to the sd devices:
} 
} mainbus0 (root)
} pci0 at mainbus0 bus 0: configuration mode 1
} pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
} ppb1 at pci0 dev 2 function 0: vendor 0x1022 product 0x7450 (rev. 0x13)
} pci2 at ppb1 bus 2
} pci2: i/o space, memory space enabled
} mpt0 at pci2 dev 3 function 0: vendor 0x1000 product 0x0050
} mpt0: interrupting at ioapic2 pin 0
} mpt0: Phy 0: Link Rate 3.0 Gbps
} mpt0: Phy 1: Link Rate 3.0 Gbps
} mpt0: Phy 2: Link Rate 3.0 Gbps
} mpt0: Phy 3: Link Rate 3.0 Gbps
} scsibus0 at mpt0: 108 targets, 8 luns per target
} scsibus0: waiting 2 seconds for devices to settle...
} sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST973401LSUN72G, 0556> disk fixed
} sd0: 70007 MB, 14089 cyl, 24 head, 424 sec, 512 bytes/sect x 143374738 sectors
} sd0: async, 8-bit transfers, tagged queueing
} sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST973401LSUN72G, 0556> disk fixed
} sd1: 70007 MB, 14089 cyl, 24 head, 424 sec, 512 bytes/sect x 143374738 sectors
} sd1: async, 8-bit transfers, tagged queueing
} sd2 at scsibus0 target 2 lun 0: <SEAGATE, ST973401LSUN72G, 0556> disk fixed
} sd2: 70007 MB, 14089 cyl, 24 head, 424 sec, 512 bytes/sect x 143374738 sectors
} sd2: async, 8-bit transfers, tagged queueing
} sd3 at scsibus0 target 3 lun 0: <SEAGATE, ST973401LSUN72G, 0556> disk fixed
} sd3: 70007 MB, 14089 cyl, 24 head, 424 sec, 512 bytes/sect x 143374738 sectors
} sd3: async, 8-bit transfers, tagged queueing
} 
} I can of course supply full boot-time messages if desired.
} 
} One thing that I suspect is unlikely to be relevant, but might be - CPU
} 0 does report
} 
} cpu0 at mainbus0 apid 0: AMD 686-class, 2592MHz, id 0x20f12
} cpu0: erratum 89 present
} cpu0: WARNING: errata present, BIOS upgrade may be
} cpu0: WARNING: necessary to ensure reliable operation
} 
} Oddly enough, the other three don't, even though as far as I can tell
} they're two identical dual-core CPUs.  This latter leads me to wonder
} if the erratum may be related to interrupt handling, which could
} concievably be related to this....
} 
} /~\ The ASCII                           Mouse
} \ / Ribbon Campaign
}  X  Against HTML              mouse%rodents-montreal.org@localhost
} / \ Email!         7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse

Follow-Ups:
- Re: "adapter resource shortage"?
  - From: Michael van Elst

References:
- Re: "adapter resource shortage"?
  - From: Mouse

Prev by Date: Re: "adapter resource shortage"?
Next by Date: Re: "adapter resource shortage"?
Previous by Thread: Re: "adapter resource shortage"?
Next by Thread: Re: "adapter resource shortage"?
Indexes:

Home | Main Index | Thread Index | Old Index