Subject: kern/18760: iop shutdown causes SCSI to disappear until cold boot
To: None <gnats-bugs@gnats.netbsd.org>
From: None <John.P.Darrow@wheaton.edu>
List: netbsd-bugs
Date: 10/21/2002 22:45:40
>Number:         18760
>Category:       kern
>Synopsis:       [jpd] iop shutdown causes SCSI to disappear until cold boot
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Oct 21 20:47:00 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     John Darrow
>Release:        NetBSD 1.6
>Organization:
	Computing Services
	Wheaton College, Wheaton, IL
>Environment:
System: NetBSD azariah.wheaton.edu 1.6 NetBSD 1.6 (GENNO386) #0: Thu Sep 26 21:13:39 CDT 2002 jdarrow@rebekah.wheaton.edu:/var/src/sys/arch/i386/compile/GENNO386 i386
Architecture: i386
Machine: i386
>Description:
Normally, when booting the machine, it goes through a memory countup
if a cold boot or a long pause otherwise, then a quick POST of the
IDE (recognizing the CDROM, etc); then it initializes the SCSI BIOS,
with the adapter counting through the three SCSI busses and
recognizing the logical disk.  Finally, netbsd boots from the logical
drive.

Attempting a warm reboot of the machine (using shutdown -r now), the
system gets to:
syncing disks... done
shutting down iop devices... done
rebooting...

The system starts to reboot as normal, but then after the pause and
POST, it doesn't seem to notice that the SCSI adapter exists - it just
goes straight through with no SCSI BIOS initialization stuff, then
stops with "No operating system found" (a message builtin to the
system BIOS).  The system requires a cold boot (reset button or power
off and on) to recognize the SCSI BIOS again.

Relevant chunks of dmesg follow:

NetBSD 1.6 (GENNO386) #0: Thu Sep 26 21:13:39 CDT 2002
    jdarrow@rebekah.wheaton.edu:/var/src/sys/arch/i386/compile/GENNO386
cpu0: Intel Pentium Pro (686-class), 200.02 MHz
cpu0: I-cache 8 KB 32b/line 4-way, D-cache 8 KB 32b/line 2-way
cpu0: L2 cache 1 MB 32b/line 4-way
cpu0: features fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features fbff<PGE,MCA,CMOV>
total memory = 1023 MB
avail memory = 944 MB
using 6144 buffers containing 52508 KB of memory
BIOS32 rev. 0 found at 0xf7cee
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
[...]
ppb0 at pci0 dev 15 function 0: Intel product 0x0960 (rev. 0x05)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
iop0 at pci0 dev 15 function 1: I2O adapter <MegaRAID>
iop0: interrupting at irq 11
Intel 82451KX/GX Memory Controller (MC) (RAM memory, revision 0x05) at pci0 dev 20 function 0 not configured
[...]
biomask eb45 netmask ef65 ttymask ffe7
iop0: configuring...
ld0 at iop0 tid 16: <I2O, RAID1, 1.00> direct access, fixed, 2048kB cache
ld0: 17365 MB, 8820 cyl, 64 head, 63 sec, 512 bytes/sect x 35563520 sectors
Kernelized RAIDframe activated
boot device: ld0
root on ld0a dumps on ld0b
root file system type: ffs

>How-To-Repeat:
see above.  Basically, "shutdown -r now" on affected system.
>Fix:
Not known.  iop_shutdown in sys/dev/i2o/iop.c sends two commands to
the IOP upon shutdown: an I2O_EXEC_SYS_QUIESCE and an
I2O_EXEC_IOP_CLEAR.  It is unknown which of these causes the failure,
or if it is something else entirely.  However, I am a little leary of
messing with this, for fear of causing data to be lost on shutdown if
I disable something that's necessary to flush the disks...

>Release-Note:
>Audit-Trail:
>Unformatted: