Subject: port-i386/25670: Occasional "unable to allocate scsipi_xfer" error messages
To: None <gnats-bugs@gnats.netbsd.org>
From: None <he@netbsd.org>
List: netbsd-bugs
Date: 05/22/2004 12:29:51
>Number:         25670
>Category:       port-i386
>Synopsis:       Occasional "unable to allocate scsipi_xfer" error messages
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 22 13:27:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Havard Eidnes
>Release:        NetBSD 2.0_BETA
>Organization:
>Environment:
System: NetBSD dolly.uninett.no 2.0_BETA NetBSD 2.0_BETA (DOLLY) #1: Fri May 21 18:31:09 CEST 2004  he@dolly.uninett.no:/usr/obj/sys/arch/i386/compile.i386/DOLLY i386
Architecture: i386
Machine: i386
>Description:
	An IBM 1U server configured as shown below with a single
	raidframe raid0	which concatenates sd0 and sd1 occasionally
	emits error messages of this type:

May 22 00:43:52 dolly /netbsd: sd3(mpt0:0:1:0): unable to allocate scsipi_xfer
May 22 00:43:54 dolly /netbsd: sd3: not queued, error 12
May 22 00:43:54 dolly /netbsd: sd3(mpt0:0:1:0): unable to allocate scsipi_xfer
May 22 00:43:54 dolly /netbsd: sd3: not queued, error 12
May 22 01:03:09 dolly /netbsd: sd3(mpt0:0:1:0): unable to allocate scsipi_xfer
May 22 01:03:11 dolly /netbsd: sd3: not queued, error 12

	This machine runs NetNews using innd with "cnfs" style article
	and overview storage (in /newsdata):

dolly# df
Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
/dev/sd2a       79827    25168     50667    33%    /
/dev/sd2f     2979262   114705   2715593     4%    /var
/dev/sd2e    29913005  1678275  26739079     5%    /usr
mfs:344        297955        4    283053     0%    /tmp
kernfs              1        1         0   100%    /kern
/dev/raid0a 351828176 40005772 294230992    11%    /newsdata
/dev/sd3a    12386530  1473232  10293970    12%    /news
/dev/sd3e    22609434        2  21478960     0%    /spare
dolly#

	I've also had this happen to either sd0 or sd1, and raidframe
	then became quite upset, repeatedly logging

Mraid0: failed to create a dag. Too many component failures.

	Eventually the machine paniced with a kernel page fault trap
	(null pointer dereference?):

.
raid0: failed to create a dag. Too many component failures.
raid0: failed to create a dag. Too many component failures.
raid0: failed to create a dag. Too many component failures.
raid0: failed to create a dag. Too many component failures.
raid0: failed to create a dag. Too many component failures.
uvm_fault(0xd19ae8a4, 0, 0, 1) -> 0xe
kernel: page fault trap, code=0
Stopped in pid 19351.1 (tcsh) at  netbsd:bgetvp+0x19:   movl    0x3c(%ecx),%eax
db> 
db> trace
bgetvp(0,c201665c,0,0,ce834984) at netbsd:bgetvp+0x19
getblk(0,c0,0,8000,0) at netbsd:getblk+0x20b
bread(0,c0,0,8000,ffffffff) at netbsd:bread+0x53
ffs_update(ce834a64,2,0,ce834a78,c064dd00) at netbsd:ffs_update+0x2db
VOP_UPDATE(ce189024,0,0,0,0) at netbsd:VOP_UPDATE+0x36
.

	However, what I'm after is getting these apparent pool memory
	allocation errors to stop happning, since they seem to be the
	root cause of these problems.  "vmstat -m" output and reading
	of the source code indicates that these errors coincide with a
	failure by the pool allocator to get a scsipi_xfer structure:

Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
	...
scxspl       148 11813057    3 11813054  1755  1752     3    10     0   inf    2
	...

	and the code:

        xs = pool_get(&scsipi_xfer_pool,
            ((flags & XS_CTL_NOSLEEP) != 0 ? PR_NOWAIT : PR_WAITOK));
        if (xs == NULL) {
                if (flags & XS_CTL_URGENT) {
                        if ((flags & XS_CTL_REQSENSE) == 0)
                                periph->periph_flags &= ~PERIPH_RECOVERY_ACTIVE;
                } else
                        periph->periph_active--;
                scsipi_printaddr(periph); 
                printf("unable to allocate %sscsipi_xfer\n",
                    (flags & XS_CTL_URGENT) ? "URGENT " : "");
        }
        splx(s);

	I have "of course" no idea whether XS_CTL_NOSLEEP is set, but
	it is probable; I've bumped NKMEMPAGES from 32K to 48K, and
	the problem still occurs.  Shouldn't the pool be extended if
	there is no room in the currently allocated pages under all
	circumstances?!?

	Now, the big question is, what should be done to avoid having
	this problem reoccur?
	
	Boot messages from the machine:

NetBSD 2.0_BETA (DOLLY) #1: Fri May 21 18:31:09 CEST 2004
        he@dolly.uninett.no:/usr/obj/sys/arch/i386/compile.i386/DOLLY
total memory = 1023 MB
avail memory = 998 MB
BIOS32 rev. 0 found at 0xfd7d1
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel Xeon (686-class), 3056.68 MHz, id 0xf29
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64b/line 4-way
cpu0: L2 cache 512 KB 64b/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: 16 page colors
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: ServerWorks product 0x0014 (rev. 0x32)
pchb1 at pci0 dev 0 function 1
pchb1: ServerWorks product 0x0014 (rev. 0x00)
pci1 at pchb1 bus 255
pci1: no spaces enabled!
pchb2 at pci0 dev 0 function 2
pchb2: ServerWorks product 0x0014 (rev. 0x00)
pci2 at pchb2 bus 1
pci2: no spaces enabled!
vga1 at pci0 dev 1 function 0: ATI Technologies Rage XL (rev. 0x27)
wsdisplay0 at vga1 kbdmux 1
wsmux1: connecting to wsdisplay0
ahc0 at pci0 dev 2 function 0: Adaptec 29160 Ultra160 SCSI adapter
ahc0: interrupting at irq 9
ahc0: aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
pchb3 at pci0 dev 15 function 0
pchb3: ServerWorks CSB5 SouthBridge (rev. 0x93)
pci3 at pchb3 bus 1
pci3: i/o space, memory space enabled
mpt0 at pci3 dev 1 function 0: LSI Logic 53c1030 Ultra320 SCSI
mpt0: interrupting at irq 9
scsibus1 at mpt0: 16 targets, 8 luns per target
rccide0 at pci0 dev 15 function 1
rccide0: ServerWorks CSB5 IDE Controller (rev. 0x93)
rccide0: bus-master DMA support present
rccide0: primary channel configured to compatibility mode
rccide0: primary channel interrupting at irq 14
atabus0 at rccide0 channel 0
rccide0: secondary channel configured to compatibility mode
rccide0: secondary channel interrupting at irq 15
atabus1 at rccide0 channel 1
ohci0 at pci0 dev 15 function 2: ServerWorks OSB4/CSB5 USB Host Controller (rev. 0x05)
ohci0: interrupting at irq 11
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
pcib0 at pci0 dev 15 function 3
pcib0: ServerWorks product 0x0225 (rev. 0x00)
pchb4 at pci0 dev 17 function 0
pchb4: ServerWorks CIOB-X2 (rev. 0x05)
pci4 at pchb4 bus 1
pci4: i/o space, memory space enabled
mpt1 at pci4 dev 1 function 0: LSI Logic 53c1030 Ultra320 SCSI
mpt1: unable to map device registers
pchb5 at pci0 dev 17 function 2
pchb5: ServerWorks CIOB-X2 (rev. 0x05)
pci5 at pchb5 bus 2
pci5: i/o space, memory space enabled
bge0 at pci5 dev 1 function 0: Broadcom BCM5703X Gigabit Ethernet
bge0: interrupting at irq 3
bge0: ASIC BCM5703 A2 (0x1002), Ethernet address 00:0d:60:1c:76:12
brgphy0 at bge0 phy 1: BCM5703 1000BASE-T media interface, rev. 2
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1 at pci5 dev 2 function 0: Broadcom BCM5703X Gigabit Ethernet
bge1: interrupting at irq 5
bge1: ASIC BCM5703 A2 (0x1002), Ethernet address 00:0d:60:1c:76:13
brgphy1 at bge1 phy 1: BCM5703 1000BASE-T media interface, rev. 2
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 0: <LG CD-ROM CRN-8245B, , 1.16> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(rccide0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST1181677LWV, 0002> disk fixed
sd0: 169 GB, 24247 cyl, 24 head, 609 sec, 512 bytes/sect x 354600001 sectors
sd0: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST1181677LWV, 0002> disk fixed
sd1: 169 GB, 24247 cyl, 24 head, 609 sec, 512 bytes/sect x 354600001 sectors
sd1: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd2 at scsibus1 target 0 lun 0: <IBM-ESXS, MAS3367NC     FN, B907> disk fixed
sd2: 34715 MB, 27150 cyl, 4 head, 654 sec, 512 bytes/sect x 71096640 sectors
sd2: sync (6.25ns offset 127), 16-bit (320.000MB/s) transfers, tagged queueing
sd3 at scsibus1 target 1 lun 0: <IBM-ESXS, MAS3367NC     FN, B907> disk fixed
sd3: 34715 MB, 27150 cyl, 4 head, 654 sec, 512 bytes/sect x 71096640 sectors
sd3: sync (6.25ns offset 127), 16-bit (320.000MB/s) transfers, tagged queueing
ses0 at scsibus1 target 8 lun 0: <IBM, 25P3495a S320  1, 1> processor fixed
ses0: SAF-TE Compliant Device
ses0: async, 8-bit transfers
boot device: sd2
root on sd2a dumps on sd2b
root file system type: ffs
raid0: Component /dev/sd0a being configured at col: 0
         Column: 0 Num Columns: 2
         Version: 2 Serial Number: 1142 Mod Counter: -1193820954
         Clean: Yes Status: 0
raid0: Component /dev/sd1a being configured at col: 1
         Column: 1 Num Columns: 2
         Version: 2 Serial Number: 1142 Mod Counter: -1193820954
         Clean: Yes Status: 0
raid0: RAID Level 0
raid0: Components: /dev/sd0a /dev/sd1a
raid0: Total Sectors: 709197552 (346287 MB)
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
sd3(mpt0:0:1:0): unable to allocate scsipi_xfer
sd3: not queued, error 12
sd3(mpt0:0:1:0): unable to allocate scsipi_xfer
sd3: not queued, error 12
sd3(mpt0:0:1:0): unable to allocate scsipi_xfer
sd3: not queued, error 12

	The kernel config is GENERIC with irrelevant parts trimmed
	out, and with the following tweaks:

maxusers        64              # estimated number of users

options         NKMEMPAGES=49152        # elevated from the default 32k

	"vmstat -m" output:

dolly# vmstat -m
vmstat: Kmem statistics are not being gathered by the kernel.
Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
phpool        40     3470    0     2793    11     0    11    11     0   inf    0
pcgpool      140     5099    0     5095     1     0     1     1     0   inf    0
pmappl        76    16672    0    16614    13    11     2     2     0   inf    0
pdppl       4096      908    0      849   522   463    59    62     0   inf    0
vmsppl       220    16672    0    16614    23    19     4     4     0   inf    0
vmmpepl       88 13075545    0 13074194  2480  2444    36    52     0   inf    4
vmmpekpl      88   190757    0   190536    16    10     6    11     0   inf    0
uaoeltpl      84      338    0      320     1     0     1     1     0   inf    0
aobjpl        52        1    0        0     1     0     1     1     0   inf    0
amappl        40   179692    0   179109     6     0     6     6     0   inf    0
bufpl        120 14809868    0 14802335  1006   440   566   807     0   inf    0
buf1k       1024    57212    0    51134   627   493   134   359     1   inf    0
buf2k       2048     1441    0     1291    22    15     7    11     1   inf    0
buf4k       4096      468    0      441    14     9     5     7     1   inf    0
buf8k       8192    13826    0    12718   705   469   236   398     1   inf    0
buf16k      16384     374    0      306    99    81    18    18     1   inf    1
buf32k      32768     309    0      207   118    66    52    58     1   inf    1
mbpl         256    16584    0    16300    59    38    21    26     1   inf    0
mclpl       2048     6661    0     6400  1278  1143   135   169     4   512    2
kqueuepl     260        2    0        1     1     0     1     1     0   inf    0
knotepl       72        6    0        1     1     0     1     1     0   inf    0
sockpl       200     6530    0     6389    11     3     8     9     0   inf    0
ttypl        264       67    0        0     5     0     5     5     0   inf    0
rndsample    528   188408    0   188402   256   255     1     2     0   inf    0
procpl       408    16691    0    16620    14     6     8     8     0   inf    0
lwppl        132    16691    0    16620     3     0     3     3     0   inf    0
pgrppl        16      512    0      477     1     0     1     1     0   inf    0
pcredpl       24    16691    0    16620     1     0     1     1     0   inf    0
plimitpl     156       66    0       56     1     0     1     1     0   inf    0
pstatspl     224    16691    0    16620     8     4     4     5     0   inf    0
rusgepl       72    16620    0    16620   264   263     1     1     0   inf    1
ptimerpl     124       76    0       54     1     0     1     1     0   inf    0
filepl        56   526571    0   526258    47    42     5     6     0   inf    0
cwdipl        12    16678    0    16620     1     0     1     1     0   inf    0
fdescpl      280    16678    0    16620    17    12     5     5     0   inf    0
pnbufpl     1024     1533    0     1530    11    10     1     2     0   inf    0
vnodepl      164    87193    0    21759  2839   112  2727  2727     0   inf    0
ncachepl      84   599740    0   534299  1387    23  1364  1364     0   inf    0
sdpcpool     120    65205    0    65205   232   200    32    40     0   inf   32
pagedeppl     68    10918    0    10918   170   169     1    89     0   inf    1
inodedeppl    88    43361    0    43361   522   521     1   375     0   inf    1
newblkpl      36    70302    0    70302   207   206     1     1     0   inf    1
bmsafemappl   36     2495    0     2495   184   183     1     1     0   inf    1
allocdirectpl 80    33533    0    33533   164   163     1    33     0   inf    1
indirdeppl    32      438    0      438   136   135     1     1     0   inf    1
allocindirpl  64    36769    0    36769   343   163   180   180     0   inf  180
freefragpl    40       42    0       42    30    30     0     1     0   inf    0
freeblkspl   172    33102    0    33102   681   680     1   671     0   inf    1
freefilepl    36    21465    0    21465   205   204     1   118     0   inf    1
diraddpl      36    24636    0    24636    99    98     1    19     0   inf    1
mkdirpl       32     9910    0     9910     7     7     0     6     0   inf    0
dirrempl      36    24624    0    24624   233   232     1   148     0   inf    1
newdirblkpl   20        1    0        1     1     1     0     1     0   inf    0
ffsinopl     176   806188    0   740770  2975   130  2845  2845     0   inf    0
dino1pl      128   806188    0   740770  2184    73  2111  2111     0   inf    0
scxspl       148 12058071    3 12058071  1824  1815     9    10     0   inf    9
wdcspl        52        8    0        8     1     1     0     1     0   inf    0
extent        20  8180472    0  8179235     7     0     7     7     0   inf    0
raidpl       128  3102981    0  3102981     2     0     2     2     2     4    2
rtentpl      132       27    0        0     1     0     1     1     0   inf    0
in6pcbpl     156       40    0       26     1     0     1     1     0   inf    0
inpcbpl      108     4513    0     4497     1     0     1     1     0   inf    0
inmltpl       32        2    0        0     1     0     1     1     0   inf    0
ipqepl        44      207    0      207   100   100     0     1     0   inf    0
tcpcbpl      316     1463    0     1440    83    81     2     3     0   inf    0
synpl        184      721    0      721   170   170     0     1     0   inf    0
sigapl      2052    16678    0    16620   786   727    59    62     0   inf    1
pipepl       140    24098    0    24063    61    59     2     2     0   inf    0
rf_alloclist_pl 168 2909800  0  2909799   785   781     4     4     3    11    3
rf_asmhdr_pl  16  2723924    0  2723924     1     0     1     1     1     2    1
rf_asm_pl    288  2909799    0  2909799   948   942     6     6     5    14    6
rf_pda_pl     40  3102981    0  3102981     1     0     1     1     1     2    1
rf_vple_pl     8       30    0        0     1     0     1     1     1     2    0
rf_rad_pl    324  2723924    0  2723924   916   912     4     4     3    11    4
rf_dagnode_pl 444 11832378   0 11832378  2861  2843    18    20    15    57   18
rf_dagh_pl    76  2909799    0  2909799     1     0     1     1     1     3    1
rf_daglist_pl 236 2909799    0  2909799   649   646     3     3     2     8    3
rf_funclist_pl 8  2909799    0  2909799     1     0     1     1     1     2    1
rf_dqd_pl    116  3102981    0  3102981   872   869     3     3     2     8    3

In use 57190K, total allocated 70104K; utilization 81.6%

dolly# 


>How-To-Repeat:
	Run NetNews on a box running 2.0_BETA; watch the SCSI
	subsystem complain when put under pressure. 	

>Fix:
	Sorry, don't know.
>Release-Note:
>Audit-Trail:
>Unformatted: