Subject: SCSI woes (was Re: 2nd SCSI adapter, what to get? )
To: None <port-i386@NetBSD.ORG>
From: Simon J. Gerraty <sjg@zen.void.oz.au>
List: port-i386
Date: 11/12/1995 19:52:36
Some followup to my earlier posting... zen has been a very reliable
system with its 1542B running 3 disks and a tape drive for about 3
years.  That all changed this weekend.  If after reading this tale of
woe you can say "fool, you forgot to wave the dead chicken" or other
useful advice, I'd appreciate it.

> Ok, my scsi chain hanging off zen's ah1542b has hit its limit.  There
> is still one target left, but adding an extra .5m of cable stops the
> bus from working... total length is already well over the limit I
> suspect. 
> 
> So, its time to bite the bullet and seriously think about adding a 2nd
> SCSI bus. 

Well after finally giving up and going off to get a 1542CP, I managed
to get a 1542CF at the last minute... great!

I took my trusty 1542B out and replaced it with the 1542CF as aha0,
that worked fine, so I re-jumpered the 1542B as aha1 on port 334, irq
12, dmq 6 and put it in... the 1542B had always been set to negotiate
sync scsi, so I set the 1542CF the same way.

I had been running the 1554B at 5.7Mb/s DMA speed. Since the BIOS for
the CF had a DMA test, I tried upping the speed, but it failed, 5.7 is
the limit for this box.

Initially it all worked fine.  Aha0 had sd0, st0 and cd0, aha1 had
sd1, sd2 and st1 (the DAT drive which was the purpose of this
exercise).  I spent some time then chosing what would go on each bus -
but more or less settled on the above.

Great.  I put the lid back on, and moved everything back to where
it belonged.  That's when the trouble started.  sd1 and sd2 which are
in an external shoe-box dissappeared.  It didn't matter which bus I
put them on they would not probe... not only that - nothing on that
bus would probe.

I had un/re-plugged the cable to the external disk box while it was
powered on and was worried that I'd blown the disks.  I grabbed one
out, and stuck it on the internal bus (sitting on top of the CPU box)
and it worked so I figured the disks were ok.  What else could be
wrong...

Although there was a terminator on each end of the bus I set the
terminator jumper on sd2 which was the last on that bus, and hey
presto both disks came back!

Great.  First thing I did was boot up and start a backup of sd2d. I
then had to go out.

When I got back a few hours later, all I could see on the console
were:

sd1(aha1:1:0): timed out
sd2(aha1:2:0): timed out

and the occasional complaint from dump.  So much for my backup.
In desperation I unplugged the 150M tape and CD, and put sd1 and sd2
on top of the CPU box and re-booted.  All looked ok, so I started the
backup of sd2d again - half an hour later it all just stopped working
again. 

I re-booted and pressed Ctrl-A (been doing that a lot this weekend)
its rather handy to be able to select an adapter by its port address
and ask it to probe all its devices - without waiting for UNIX to
start probing.  When I asked it to probe aha1 (port 334) it said scsi
id#1 and just sat there.... I pressed esc. and asked it to do it again
- it said there was no adaper at port 334!

I cycled the power and repeated the probe - no adapter at 334.
I turned it all off and went to bed.

This morning the adaptec bios was still adamant that there was no
adapter at port 334... so I ripped it out - multi-meter says the fuse
is ok btw.

So now I hooked just the disks and DAT drive to aha0 (1542CF) and
booted.  All ok. I did a backup of sd2d - it completed ok.
I shutdown, put sd1 and sd2 back in their shoe-box and connected the
rest of the internal devices.  The 1542CF now had sd0,sd1,sd2,st0,st1
and cd0!!! And it worked!  I booted up and did backups of everything
else - ok.

The machine ran happily like that _all_ day, sitting in the middle of
the room, DAT drive on the floor etc etc. Ok so I didn't even put the
lid back on the CPU (it always stops working when you do that...), I
just halted, powered down and moved the CPU back under the desk and
put the DAT drive on top - no cables re-arranged or anything.

Booted and - no scsi devices at all!  Powered off, wiggled a few
cables, booted and nothing.  Repeated the swearing, wiggling etc and
booted - all devices probed ok!

At this point my once faithful zen is far from my favourite entity.
If anyone can suggest what I should change to avoid the swaering and
cable wiggling each time I want to boot, I'd appreciate it.

--sjg