Subject: Multitude-of-disks (was: Re: Looking for newer binaries... )
To: None <macbsd-development@sun-lamp.cs.berkeley.edu>
From: Space Case <wormey@eskimo.com>
List: macbsd-development
Date: 12/12/1993 22:02:40
(n.b.:  This thread was originally on macbsd-general, but it seems to
be more in the domain of -development, so I directed it there.)

briggs@csugrad.cs.vt.edu (Allen X Briggs) wrote:
[monroe@cs.pdx.edu (Monroe Williams):]
>> FYI, I've had the same multitude-of-disks problem other people have
>> reported with the netbsd.current kernel.

>Hmmmm.  This is a wierd problem.  I've gotten a fix from a Steve that
>seems to work, although I can't explain why.  It looks almost like
>the disks are returning success for test unit ready for all luns.

Yup, that Steve was me.  Now, I don't have a copy of the NCR5380
(scsi chip) manual or the scsi spec, so my fix was purely an educated
guess that worked; nor have I traced all the code to determine side
effects of the fix, and that is why I am writing this -- to get more
eyes looking to see where there might be problems.

During the device probe stage, three commands are passed to each lun.
There are two paths by which a command can get passed to the lun.  They
happen in mac68k/dev/scsi.c, as follows:

  scsi_group0
             \
              scsi_request -> command_transfer
             /
     scsi_gen

scsi_group0 builds a command structure, which gets passed, along with
scsi target and lun, to scsi_request.  The only thing scsi_request does
with lun is to use it in an error message.

scsi_gen receives a command structure, target and lun.  However, at least
in the probe stage, the command structure is all zeros.  In the original
code, these all get passed unchanged to scsi_request.

It seems (and this is where the educated guess comes in) that the command
structure that gets passed to the scsi chip consists of a command byte
followed by a number of data bytes.  The high three bits of the first
data byte contain the lun.  The rest of the bits and the next two bytes
contain address.  (In the structure built by scsi_group0, anyway.)  As
you can see, the command structure coming via scsi_gen will always
address lun 0.

What I did, the side effects of which need to be checked, is to add
a line to scsi_gen to place the lun into the high three bits of the
first data byte.  And presto, the multitude-of-disks problem went away.

I can see three possible outcomes from this:  1) the general direction
is correct and more of the command structure needs to be filled in;
2) it is perfectly OK as it is and nothing else needs to be done; or
3) this causes a conflict with a procedure that passes a correct
command structure through scsi_gen, and a better way needs to be found.

I would appreciate it if some of you kind souls out there would look
at the code, and either confirm my hypothesis, or if I'm wrong, point
out a better way.

~Steve

BTW, another discovery that I made is that the code in serattach()
(mac68k/dev/ser.c) that initializes the serial ports kills the serial
boot echo.  I had to comment it out in order to get my trace output
to my other machine.  (That was faster than tracking the problem. ;^)



------------------------------------------------------------------------------