Subject: kern/10574: major ahc problems on 1.4.2 after running current on same machine
To: None <gnats-bugs@gnats.netbsd.org>
From: None <wiz@danbala.tuwien.ac.at>
List: netbsd-bugs
Date: 07/12/2000 17:51:13
>Number:         10574
>Category:       kern
>Synopsis:       major ahc problems on 1.4.2 after running current on same machine
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 12 17:52:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Wiz
>Release:        1.4.2/i386
>Organization:
Thomas Klausner - wiz@danbala.tuwien.ac.at
Der Horizont vieler Menschen ist ein Kreis mit Radius Null -- und
das nennen sie ihren Standpunkt. (found on /.)
>Environment:
	
System: NetBSD hiro 1.5B NetBSD 1.5B (HIRO.ISDN) #0: Tue Jul 11 16:17:12 CEST 2000 wiz@hiro:/archive/cvs/src/sys-i4b/arch/i386/compile/HIRO.ISDN i386

>Description:
Ever since I had a -current running with the new ahc-driver, I have
great problems with booting 1.4.2 on the same machine (-current is
working fine). This is with an Adaptec 2940UW.

In the 1.4.2 boot process, the first two devices on the SCSI bus seem
to get recognized okay, but then I get the following messages
(transcribed by hand): 
	ahc0: target 3 synchronous at 20.0MHz, offset 0xf
	ahc0: board is not responding
	cmd fail
	probe(ahc0:3:0): timed out in datain phase, SCSIGI == 0x44
	probe(ahc0:3:0): asserted ATN - device reset in message buffer
	sd6 at scsibus0 target 3 lun 0: <, , > SCSI 0 0/direct fixed
	ahc0: board is not responding
	ahc0: board is not responding
	sd6: mode sense(4) returned nonsense, using fictitious geometry
	ahc0: board is not responding
	cmd fail
	sd6: 40960MB, 40960 cyl, 64 head, 32 sec, 512 bytes/sect x 83886081 sectors
	ahc0: board is not responding
	cmd fail
	ahc0: board is not responding
	cmd fail
	sd2 at scsibus0 target 4 lun 0: <, , > SCSI 0 0/direct fixed
	ahc0: board is not responding
	ahc0: board is not responding
	sd2: mode sense(4) returned nonsense, using fictitious geometry
	ahc0: board is not responding
	cmd fail
	sd2: 40960MB, 40960 cyl, 64 head, 32 sec, 512 bytes/sect x 83886081 sectors
	ahc0: board is not responding
	cmd fail
	ahc0: board is not responding
	cmd fail
	sd3 at scsibus0 target 5 lun 0: <, , > SCSI 0 0/direct fixed
	ahc0: board is not responding
	ahc0: board is not responding
	sd3: mode sense(4) returned nonsense, using fictitious geometry
	ahc0: board is not responding
	cmd fail
	sd3: 40960MB, 40960 cyl, 64 head, 32 sec, 512 bytes/sect x 83886081 sectors
	ahc0: board is not responding
	cmd fail
	ahc0: board is not responding
	cmd fail
	sd7 at scsibus0 target 6 lun 0: <, , > SCSI 0 0/direct fixed
	sd7: drive offline
	ahc0: target 0 synchronous at 20MHz, offset=0x8
	sd0(ahc0:0:0): timed out in message in phase, SCSIGI==0xb6
	ahc0: Issued channel A Bus Reset #1. 2 SCBs aborted
	ahc0: target 0 using 16bit transfers
	ahc0: target 0 synchronous at 20MHz, offset=0x8
	ahc1: target 0 synchronous at 20MHz, offset=0xf
	ahc2: target 0 synchronous at 20MHz, offset=0xf
	sd6(ahc0:3:0): timed out in datain phase, SCSIGI==0x44
	sd6(ahc0:3:0): asserted ATN - device reset in message buffer
	sd6(ahc0:3:0): timed out in datain phase, SCSIGI==0x54
	ahc0: Issued Channel A Bus Reset #1. 1 SCBs aborted.
	findroot: can't open dev sd6a
(some more lines follow, and then it wedges)

Especially nice are the fictitious geometries, and that sd7 at target
6 is offline -- on -current it's sd3 at target _5_.

It's not possible for me to boot 1.4.2 and work with it. I can,
however, mount the 1.4.2 partition and access all drives from -current
on the same machine without any trouble.

I'll attach a working 1.4.2-dmesg and a -current (1.4ZD) dmesg output
for the machine.

I don't think this is a hardware problem, since -current runs without
problems, and 1.4.2 also ran for at least half a year nearly without
problems; other operating systems also don't give any signs of trouble.

Any ideas what could be the cause?

1.4.2 is on sd0a, -current on sd5a.
>How-To-Repeat:
It should suffice to boot with a -current kernel on a machine with an
Adaptec, and then try to boot 1.4.2 again -- at least that's what did
it for me.
>Fix:
Sorry, don't know.
>Release-Note:
>Audit-Trail:
>Unformatted: