port-mac68k: re-- The esp driver

Subject: re-- The esp driver
To: port-mac68k mailing list <port-mac68k@NetBSD.ORG>
From: Steve Revilak <revilak@umbsky.cc.umb.edu>
List: port-mac68k
Date: 04/30/1998 10:47:34
Allen Briggs <briggs@canolog.ninthwonder.com> wrote:

>Some folks have seen the quick version of the esp driver give them errors
>that look something like:
>
>dmaintr: discarded 32 b (last transfer was 2544 b).
>esp0: !TC [intr 10, stat 83, step 4] prevphase 0, resid 9f0
>sd0(esp0:0:0): esp0: timed out [ecb 0x6baa8d0 (flags 0x3, dleft 9f0, stat
>0)], <state 4, nexus 0x6baa8d0, phase(c 3, p 3), resid 0, msg(q 0,o 0) >
>sd0(esp0:0:0): esp0: timed out [ecb 0x6baa8d0 (flags 0x43, dleft 9f0, stat
>0)], <state 4, nexus 0x6baa8d0, phase(c 3, p 3), resid 0, msg(q 20,o 0) >
>AGAIN
>
>Mehmet Orhun SEYMEN <o52931@sumela.ktu.edu.tr> has noticed that if he
>disables the read/write cache on the hard drive, this problem does not
>occur.  Also, I believe that this does not occur if you're not using the
>pseudo DMA/Turbo SCSI/Quick SCSI option.

I got a similar set of messages when running NetBSD from a syquest
removable drive (EZ135).  They would appear occasionally when in single
user, almost constantly while in multi-user.  After 2-3 days of light use,
the file system would become so trashed that a re-installation was
necessary in order to boot the machine.

My exact error message was:
>Mar  6 13:39:40  /netbsd: dmaintr: discarded 32 b (last transfer was
>1008b).
>Mar  6 13:39:40  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase
>0, resid 3f0


In corresopndences with others while attempting to resolve this problem, I
found that other syquest removables were suspect, as were ZIP drives.
Several related issues I investigated--

* Clock chip--I use NetBSD on  Q605 with a full 040, accelerated to 33 mHz.
Problems occured regardless of whether the clock chip was installed.

* Problem occurs with the 1.3 Kernel.  I also tried with the 1.3.1
kernel--same results.

In addition to removables, there were people getting these errors from
non-removable drives, but much fewer.

Colin Wood responded to one post:
>My guess would be that you're getting bitten by the fact that NetBSD SCSI
>drivers seem to have a long history of _not_ liking removable media.
-------------snip
>The above isn't all that surprising.  Unfortunately, I don't know of any
>workarounds at the moment other than to try a non-removable SCSI disk.  If
>you get the chance, please file a PR on this problem, noting the exact
>error message (you can use the web form on www.netbsd.org if you don't
>want to use the send-pr command in NetBSD).
>

and from another post:
>There error you're seeing only appears under the esp driver, which only
>'040 Macs use.  That's why on that one.  The '030 Macs have a host of
>other problems ;-)  Actually, there is a long-standing bug in the ncr5380
>SCSI driver (used in the '030's) which no one has found quite yet.  This
>new bug in the esp driver has only seemed to have surfaced in the last 3
>months or so.


david@canopus.apana.org.au (David Johnston) offered the following:

>Unfortunately I don't think this has anything to do with removable media
>as such. I think it has more to do with the LC475/Q605 motherboard + new
>quadra SCSI driver not liking certain disks, particularly slow ones.
>
>I have an LC475 and I had lots of identical problems with several Quantum
>disks I own. The problem was worse the slower the disk was - with a fast
>1Gig fireball I would very occasionally get such errors, with a medium
>speed
>540MB fireball I would get it more often and with a slow 40MB quantum I got
>it almost continuously. The problem seemed to occur more often when writing
>to the disk rather then when reading from it.
>
>I spoke to Allen Briggs about this problem four months or so ago, but he
>wasn't able to find the problem at the time.
>
>I bought a Segate 2Gig medalist disk and stuck it in the machine and have
>never seen the problem again in several months of continuous operation.


Ken Nakata <kenn@eden.rutgers.edu> (using an 840 AV) wrote:

>This is a Quadra 840av.  I have two Quantum drives (internal LPS 270S
>and external Fireball 1080S), a Fujitsu M2694ES (an old 1G drive), a
>Pioneer DR-U12X CD-ROM drive, and a Zip drive (though the Zip is only
>occasionally mounted when exchanging files with other systems, much
>like the way you would use the floppy drive).
>
>Also, I've used the Fireball and the Fujitsu on my SE/30 with ncr
>driver with no problem whatsoever.  At the time, others seemed to have
>bad experience with the Fireball-ncr combination, but my memory may be
>a bit faulty.

My ultimate solution was to get NetBSD onto a non-removable drive (Apple
OEM IBM H-3171).  It's been fine ever since, however this drive is actually
slightly *Slower* that the Syqyest I was using.  Maybe it's not drive
speed...



Another reply I received gave instructions for recompiling the kernel with
the quick SCSI option disabled (non-pertainnat portions snipped):

>From: Dennis Eberl <drw@adelphia.net>
>Subject: Re: NetBSD scsi errors
>
>The problem I had was the seemingly random and irritatingly frequent
>occurance of a series of error messages related to my SCSI hard drive
>followed by a reset of the SCSI. The problem was benign in that I never
>lost and data or ended up with corrupted files. Indeed, I could be in the
>middle of a vi edit, get the messages and bus reset (patience), and pick up
>the edit exactly where I left off.
>
>I'm sorry I don't remember what I did exactly to cure the problem. I am
>running 1.3 and know I had to recompile the kernel...
>
>...and now that I am awake and remembering all this (three DEC 166 MHz
>Alpha udb's running Linux 5.0 have been taking up my time), I will try to
>find the details...
>
>OK. The authority on this is Allen Briggs (briggs@macbsd.com). You have to
>recompile the kernel after changing one line in set file 'esp.c', which
>should be in '/usr/src/sys/arch/mac68k/dev/'. In 'esp.c' a line or few
>after line 238, you want to change a lonely little line that reads: 'quick
>= 1;' to 'quick = 0;'. The statement you are to edit is after line 238 and
>just before an 'if (quick) {...} statement of 5 or 6 lines. Basically, what
>you are doing is turning off "quick SCSI" to get rid of the problem. At the
>time I had the problem "quick SCSI" had not yet been implemented on
>OpenBSD, which is why I didn't encounter it there. My impression is that
>NetBSD is ahead of OpenBSD, which is why I went to the trouble to recompile
>the kernel.
>
>In any case, that's what fixed the problem. I would contact Allen Briggs
>before doing the recompile, because he may well have fixed the problem.
>
>Best of Luck,
>
>Dennis Eberl


I didn't have the space to recompile a kernel, though I did find someone
gracious enouth to try it out:


>From: henning loeser <loeser@ma1304.physik.uni-marburg.de>
>To: Steve Revilak <revilak@umbsky.cc.umb.edu>
>Subject: Re: syquest & NetBsd
>
>Hi Steve,
>well after an hour 40 minutes it spit out a new kernel I can boot it. So
>far so good. Testing wether I can copy large files: .....
>>dmaintr: discarded 32 b (last transfer was 7152 b).
>>esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 1bf0
>there it goes again. So I guess changing the quick SCSI to 0 didn't do the
>job.


If memory serves correctly, he was attempting to copy a large file to a
Syquest 105 removable.  (I had found that even moderately large file
operations would cause the kernel to panic  ie -- ls -lR > filename.txt
while at the root directory.

I did file a bug report.  It's number 5133.

Sorry for the length of this posting, but hopefully there's some helpful
stuff in there.

Steve Revilak
revilak@umbsky.cc.umb.edu