Subject: Re: new si and reselects (Curiouser and curiouser)
To: None <gwr@mc.com>
From: Ian Dall <dall@HFRD.DSTO.GOV.AU>
List: port-sun3
Date: 12/06/1995 11:01:19
"Gordon W. Ross" <gwr@mc.com> writes:

  >> Date: Tue, 5 Dec 1995 10:49:40 +1030
  >> From: Ian Dall <dall@hfrd.dsto.gov.au>


  >> "Gordon W. Ross" <gwr@mc.com> writes:
  > [ On the problem with SCSI commands with length 10 ]

  >> I am seeing the same problem attempting to copy the the mini-root from
  >> to a Fujitsu disk (can't remember the model off hand, built in scsi
  >> interface).
  >> 
  > [ When it happens, just continue... ]

  >> What happens then? If the command is rejected you still can't
  >> read/write to the disk unless the driver does something smart to
  >> compensate.

  > The command will just fail.  (Yeah, not great.)

Of course, it depends what the problem is. If the target is noticing
that the block no is out of range before it gets the whole command
then it is a good thing for the command to fail. If it is that the
device can't handle extended reads, then that is not such a good
solution. However, I don't think either of these is the reason
(at least in my case).


I did a bit more work and here is the information I have gathered.

Background:

   Sun 3/50, 12MB ram rather full scsi bus. The configuration has all been
   working fine under SunOs.

   The relevant disks in the following are:
     sd0 Fujitsu M2624SA (512MB, 3.5in)
     sd2 Fujitsu M2246SA (150MB, 5 1/4in)

   despite the part numbers being anagrams of each other these are quite
   different disks with sd0 being much newer.

I am trying to install onto sd2. sd0 is for another purpose, but is on the
same bus.

I have the miniroot loaded on sd2b.
Now:

    dd if=/dev/rsd0c bs=32k of=/dev/null

no problem. Now try the other disk:

    dd if=/dev/rsd2c bs=32k of=/dev/null

now I get the dreaded "wanted 10 got 6" message
    si(0:1:0) -28,0,0,0,0,0,0,0,40,0-
    comand aborted, info = 32 (decimal), data = 00 00 00 00 00
    00 00 00 00 00 00 08 1e 00 00 00 20 c0 31 01 00 0c 00 81 89
    ff ff ff ff

I don't know about the data, but, presuming this is from a "request sense"
command, the "info" field should be the amount read. This is odd. The
target new to read 32 blocks even though (if it only transferred 6 bytes of
the command) it never got that parameter!
Now reduce the block size:

    dd if=/dev/rsd2c bs=16k of=/dev/null

no problem.

Using the smaller block size is how I got the miniroot installed and I
even managed to run install to get the base collection installed.
The only thing that went wrong (apart from being slow) is fsck failed.

    fsck -f -n /dev/rsd2b

    ... wanted 10 got 6
    ... -28,0,0,0,84,6b,0,0,0,40,0-
    ...
    CANNOT READ: BLK 64
    CONTINUE?

on continuing there are more such errors. The block number in the command
is correct.

Now use the kernel debugger to "wr si_options 3":

    dd if=/dev/rsd2c bs=32k of=/dev/null

and the 32k block size works (and the transfer rate is about 10
times as high)!

Also

    fsck -f -n  /dev/rsd2b

works.

I tried thrashing the read access fairly hard

    while fsck -f -n /dev/rsd2b; do :; done& while fsck -f -n /dev/rsd2b; do :; done

and everything is rosy. So why not just use the si_options = 3?  Well,
as soon as I mount the partition read/write, it gets corrupted very
quickly. Whereas with si_options = 0, I can mount read/write and so
long as I don't do transfers of bigger than 16k, everything is OK (but
slow). Unfortunately, fsck then doesn't work and it is not really an
optional utility!

I have no idea whether the two behaviors (problems writing using DMA
and this apparent command size problem with polled IO) are related.
The command size one is definitely target dependant, but the target
*can* handle 10 byte read commands, presumably with the same
parameters, using DMA. Presumably the trashing of the disk when using
DMA to write is also target dependent since others haven't seen it,
but I don't want to corrupt sd0 which is for another purpose so
I haven't tried it.

I know how hard it is to debug these things when you can't duplicate
them because you don't have the same config. Is there any way of getting
more debugging info which would be useful?

Ian