Subject: Re: Serious SCSI-problems
To: None <apriebe@aip.de>
From: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
List: port-pmax
Date: 07/23/1999 16:37:55
On Wed, 21 Jul 1999 apr@spade.apc.aip.de wrote:

> >   I sort of know the reason, and it should be fixed in 1.4.1.  There might
> > be a pre-1.4.1 kernel out on ftp.netbsd.org.
> 
> I am one of the guys who suffered from the SCSI problem for almost one
> year (or more???).
> And indeed the pre 1.4.1 snapshot kernel shows no problems now for over a week!
> Michael, could you enlighten us HOW you did solve the problem?

  Um, lots of debugging with printf().  Enabling the DEBUG code currently
in the driver changed timing enough that the problem disappeared, so I had
to tackle the problem differently.  I don't recall all I went through, but
some output in asc_get_status() indicated that the DMA length of the
current operation was 10 - which is the length of the SCSI command for the
read and write operations.  I made a guess that the extraneous data left
in the FIFO when asc_get_status() was getting called was residue from the
DMA transfer of the SCSI command, so I started sticking printf() output in
asc_startcmd() and found that I was seeing a non-zero fifo count at times,
particularly just before it failed.  I couldn't track down exactly where
that FIFO data was coming from - trying to flush it only caused more
problems.  It was looking like maybe there was a conflict between starting
a new command and getting a reselect and that collision was confusing the
SCSI chip.  The start code was preloading the FIFO with one byte of data,
and I just moved that preload to just before sending the command byte to
the SCSI chip.  Once I did that, I no longer saw that particular problem.

> BTW The __-current__ kernel now gives me those TLB errors (see my mail from
> Mon, 28 Jun 1999 on this list) as with -current from July 4th.
> Any solution? I received just one mail from Tohru Nishimura, who experienced
> the same problem.

  All my systems are still at 1.4 or 1.4.1, so I have not had a chance to
even try building or running any -current systems.  I'm really getting
behind :-(.

Michael

---
Michael L. Hitch			osymh@montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA