Subject: problems with siop(4) and various SCSI devices....
To: NetBSD-current Discussion List <current-users@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: current-users
Date: 04/19/2001 01:58:18
I finally got around to playing with some older ST43400N drives on my
development server.  I attached them to the same bus where I have my
Exabyte 210 library and its Eliant 820 drive.  Here's how it all probes
with my 2001/03/24 kernel on i386 (the card is an ASUS PCI SC875):

| siop0 at pci0 dev 14 function 0: Symbios Logic 53c875 (ultra-wide scsi)
| siop0: using on-board RAM
| siop0: interrupting at irq 15
| scsibus1 at siop0: 16 targets, 8 luns per target
| scsibus1: waiting 2 seconds for devices to settle...
| siop0: alloc newcdb at PHY addr 0x1b000
| siop0: target 0 using tagged queuing
| sd4 at scsibus1 target 0 lun 0: <SEAGATE, ST43400N, 1028> SCSI2 0/direct fixed
| siop0: target 0 now synchronous at 10.0Mhz, offset 15
| sd4: 2777 MB, 2737 cyl, 21 head, 98 sec, 512 bytes/sect x 5688447 sectors
| siop0: target 1 using tagged queuing
| sd5 at scsibus1 target 1 lun 0: <SEAGATE, ST43400N, 0116> SCSI2 0/direct fixed
| siop0: target 1 now synchronous at 10.0Mhz, offset 15
| sd5: 2777 MB, 2737 cyl, 21 head, 98 sec, 512 bytes/sect x 5688447 sectors
| siop0: target 2 using tagged queuing
| sd6 at scsibus1 target 2 lun 0: <SEAGATE, ST43400N, 1028> SCSI2 0/direct fixed
| siop0: target 2 now synchronous at 10.0Mhz, offset 15
| sd6: 2777 MB, 2737 cyl, 21 head, 98 sec, 512 bytes/sect x 5688447 sectors
| siop0: target 3 using tagged queuing
| sd7 at scsibus1 target 3 lun 0: <SEAGATE, ST43400N, 1028> SCSI2 0/direct fixed
| siop0: target 3 now synchronous at 10.0Mhz, offset 15
| sd7: 2777 MB, 2737 cyl, 21 head, 98 sec, 512 bytes/sect x 5688447 sectors
| st0 at scsibus1 target 4 lun 0: <EXABYTE, EXB-85058HE-0000, 0108> SCSI2 1/sequential removable
| st0: siop0: target 4 now synchronous at 5.0Mhz, offset 15
| drive empty
| ch0 at scsibus1 target 5 lun 0: <EXABYTE, EXB-210, 3.08> SCSI2 8/changer removable
| ch0: 11 slots, 1 drive, 1 picker, 0 portals

One thing I can say about siop(4) vs. ncr(4) is that it's one heck of a
lot faster, at least with disks!  With the above drives under the ncr(4)
driver, performance basically sucked.

With siop and a filesystem using fragsz=4096, blksz=32786, spg=32:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
         1000  2564 19.0  2571  8.8  1453  5.3  4432 35.8  4584  7.8  31.9  1.5

With four simultaneous bonnie jobs, each on one of four disks, all on
the same bus:

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100  2044 16.4  2188  7.9   976  3.9  1840 15.9  2125  4.0  31.2  1.5
          100  2048 16.5  2148  7.9   966  4.0  1788 15.4  2149  4.0  30.7  1.5
          100  1997 16.0  2453  9.1   913  3.9  1757 15.1  2204  4.2  30.3  1.5
          100  2042 16.4  2380  8.9   835  3.7  1848 15.8  2434  4.7  29.9  1.4

(FYI this machine has 192MB RAM)

The CCD driver seems to get about 6.5MB/s with these drives (98-sector sripes).

However either these drives are more flakey than I thought, or their
power supply is, or siop(4) is -- they are in a TRIMM chassis with a
330W power supply that cannot spin them all up simultanously despite
what its specifications claim, so it may be the PS, but if they spin up
one at a time they work fine.  Unfortunately after a bus reset they're
not always easy to get going again.

Does/can the driver spin down disks when they're not mounted/open?

The timeouts for spinning up these massive old disks seem a little low
too, though like I say they (or one) may be a bit flakey.

Sometimes it takes a lot of persuasion to bring all the drives back
online after a reset -- sometimes I have to power them down and run
"scsictl identify" on each in succession, and sometimes I need to repeat
the whole process several times over.

Unfortunately this driver doesn't work very well with the EXB-210.
Amanda tries to find the correctly labeled tape and causes this:

| siop0: target 4 now synchronous at 5.0Mhz, offset 15
| DMA IRQ: bus fault dma fifo empty, DSP=0x3c2b4 DSA=0xffffffff: siop0: current DSA invalid
| siop0: scsi bus reset
| cmd 0xc0904a40 (target 5:0) in reset list
| cmd 0xc0904a40 (status 2) about to be processed

Unfortunately any time the bus is reset and Amanda is running I get tons
of these while the library finds out where all the tapes are again:

| ch0(siop0:5:0):  Check Condition on CDB: 0xa5 00 00 56 00 52 00 08 00 00 00 00
|     SENSE KEY:  Not Ready
|      ASC/ASCQ:  Logical Unit Is in Process Of Becoming Ready

Now I do have "options SCSIVERBOSE" and "options DEBUG", but this is a
little much -- it's *NOT* an error!!!

In any case eventually the tape comes online again:

| siop0: target 4 now synchronous at 5.0Mhz, offset 15

but the "not ready" noise continues while the library counts its tapes....

I'm going to disconnect the drives just to eliminate any variables and
go back to the same hardware config I had when Amanda last worked
properly, and then try another amflush.  If that doesn't work I'll have
to go back to the ncr(4) driver on this machine for now....

Oh, and are there any patches around to make this work?

	# mt status
	mt: /dev/nrst0: Operation not supported by device

I'm also going to move the disks over to my development sparc-20 too
just to see if they're less flakey on its bus.  I'm also still looking
for a replacement 400W or 450W PS/2-style AT power supply too just too
(though none of the stores I've tried so far around here will even try
to order any of the three specific still-current part numbers I gave to
them...).

My goal is to experiment with RAIDframe with these drives (I've even got
two cold spares...), but I guess it doesn't really matter whether I do it
on the sparc-20 or the pII machine.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>