Subject: Re: kern/29936: isp(4) with Qlogic 2312 FC HBA hangs with: "unable to load DMA (35)"
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Greg A. Woods <woods@planix.com>
List: netbsd-bugs
Date: 04/11/2005 00:01:02
The following reply was made to PR kern/29936; it has been noted by GNATS.

From: "Greg A. Woods" <woods@planix.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: NetBSD GNATS submissions and followups <gnats-bugs@netbsd.org>,
	<kern-bug-people@NetBSD.org>,
	NetBSD GNATS Administrator <gnats-admin@NetBSD.org>
Subject: Re: kern/29936: isp(4) with Qlogic 2312 FC HBA hangs with: "unable to load DMA (35)"
Date: Sun, 10 Apr 2005 19:59:57 -0400 (EDT)

 [ On Sunday, April 10, 2005 at 22:42:12 (+0200), Manuel Bouyer wrote: ]
 > Subject: Re: kern/29936: isp(4) with Qlogic 2312 FC HBA hangs with: "unable to load DMA (35)"
 >
 > > 	    isp1: unable to load DMA (35)
 > 
 > This is EAGAIN. My guess is that pci_sgmap_pte64_load() is in ressource
 > shortage.
 
 Indeed, but why is it so "fatal" -- a "shortage" is not an "outage" and
 I wouldn't have thought it to be a permanent condition....
 
 > > 	    sd6(isp1:0:1:0): adapter resource shortage
 > 
 > the scsipi subsystem will sleep for one second and try again, 5 times.
 
 Which of course won't help if the "shortage" never goes away (in time?).
 
 
 > What is strange is that you say other isp devices don't have this problem.
 
 No, so far it hasn't, though I wasn't going to let a sample of 2 decide
 that certain.  :-)
 
 
 > If there is ressource shortage it should be for everyone using this sgamap.
 > If I understood it properly, the sgamap is per-tsp bus, which means that
 > the ressource shortage is only for devices on the pci1 bus.
 > I see you have lots of network adapters on pci1; it's possible that their
 > drivers allocate DMA ressources statically, causing this condition.
 > You should try to arrange to have all network devices on one PCI bus,
 > and all scsi ones on the second PCI bus.
 
 Well that's a very good clue!  Thanks!
 
 Indeed the bge0 device on pci1 (along with isp1) is not being used,
 partly because it alone can trigger some very similar kind of problem
 with DMA resources.  Like I say it's unused, however I suppose there
 could be some situation which might somehow trigger it and cause it to
 try to allocate DMA buffers.  As far as I know nobody has ifconfig'ed it
 before either hang, but it's possible someone or something did something
 to activate it.  (However the third crash -- the one where everything
 hung completely, was, perhaps not coincidentally, right after I had done
 a "pcictl pci0 list" command to get the product code for the Qlogic
 card.)
 
 I had thought I had applied Jason's patches from the "bge(4) (DEGXA-TX)
 no-go on the AlphaServer ES40" thread on tech-kern (& port-alpha) to the
 1.6.x code too, but it seems I had not, so the 1.6.x version definitely
 still causes problems on big memory machins.
 
 I guess this still all boils down to needing a proper fix for PR# 28362
 as well as complete support for 64-bit DMA so that mapping doesn't have
 to be done for 64-bit cards on 64-bit systems like this.
 
 In the mean time I will remove the bge driver from the kernel entirely
 and hope that it was indeed the underlying cause.
 
 However that still leaves wm0 (and the unused wm1) on pci1 along with
 isp1.  I'm not very comfortable with moving all the isp and ahc devices
 to one bus just to put the network devices alone on the other, but I
 suppose if that's what it takes....  I guess I won't know for sure
 though if the bge removal fixes it until at least a couple of weeks go
 by without further problems along these lines.
 
 (note I cannot bring up wm1 concurrently with wm0 with this kernel -- I
 encounter a similar DMA resource problem....  I'm not even sure it
 worked with a -current kernel.  I didn't want a dual-port card, but they
 were the same price as the single, and a dual is of more use in other
 kinds of machines if I can ever get the bge to work again, and if we
 ever get a copper GigE port on the/a switch to connect it with, but in
 the mean time even without the DMA resource issues, the bge driver still
 only goes about half the speed of the wm driver.  :-)
 
 -- 
 						Greg A. Woods
 
 H:+1 416 218-0098  W:+1 416 489-5852 x122  VE3TCP  RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>