Subject: SPARCengine Ultra AXi bogus power failure
To: None <port-sparc64@netbsd.org>
From: Francis Devereux <francis@devrx.org>
List: port-sparc64
Date: 02/27/2003 15:13:21
I have a Sunray workstation based on the SPARCengine Ultra AXi board.  When I
boot NetBSD (1.6) I get the message "Power Failure Detected: Shutting down
NOW." and the machine powers off.  Here are the messages I get:

Executing last command: boot disk1
Boot device: /pci@1f,0/pci@1/scsi@1/disk@1,0  File and args:
NetBSD IEEE 1275 Bootblock
..>> NetBSD/sparc64 OpenFirmware Boot, Revision 1.5
>> (autobuild@cs20.apochromatic.org, Sun Sep  8 11:34:12 UTC 2002)
loadfile: reading header
elf64_exec: Booting /pci@1f,0/pci@1/scsi@1/disk@1,0:a/netbsd
3718088@0x1000000+144808@0x1800000+4049496@0x18235a8
symbols @ 0xfee3e300 74+290352+153040 start=0x1000000
chain: calling OF_chain(800000, e478, 1000000, fffa5a80, 18)
[ using 444240 bytes of netbsd ELF symbol table ]
consinit()
setting up stdin
chosen = f002d974, stdin @ 0x1817b90
stdin instance = fff73f48
stdin node = f0069860
setting up stdout
stdout instance = fff99e50
stdout package = f0069860
buffer @ 0x1c09d90
console is /pci@1f,0/pci@1,1/ebus@1/se@14,400000:a
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.6 (KRAKEN) #0: Thu Feb 27 00:35:16 GMT 2003
    francis@cepre.repton.int:/usr/src/sys/arch/sparc64/compile/KRAKEN
total memory = 512 MB
avail memory = 466 MB
using 3289 buffers containing 26312 KB of memory
bootpath: /pci@1f,0/pci@1,0/scsi@1,0/disk@1,0
mainbus0 (root): SUNW,UltraSPARC-IIi-Engine
cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 440.048 MHz, version 0 FPU
cpu0: physical 32K instruction (32 b/l), 16K data (32 b/l), 2048K external (64 b
/l)
psycho0 at mainbus0 addr 0xfffc0000
SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 128; PCI bus 0
Power Failure Detected: Shutting down NOW.

This happens with the GENERIC and INSTALL kernels as well as my custom
kernel.  I managed to install NetBSD by booting Solaris 9 first and then warm
booting NetBSD - this sometimes, but not always prevents this problem (I
haven't tracked down the exact conditions when it fixes it yet).

I had a look around in the kernel source and found that the following changes
to arch/sparc64/dev/psycho.c, which comment out the code that attaches the
psycho_powerfail interrupt handler, fix the problem even without booting
Solaris beforehand:

--- psycho.c	Thu Feb 27 15:06:46 2003
+++ psycho.c.orig	Thu Feb 27 09:41:37 2003
@@ -83,9 +83,7 @@
 static int psycho_ce __P((void *));
 static int psycho_bus_a __P((void *));
 static int psycho_bus_b __P((void *));
-#if 0
 static int psycho_powerfail __P((void *));
-#endif
 static int psycho_wakeup __P((void *));
 
 
@@ -450,11 +448,9 @@
 		psycho_set_intr(sc, 15, psycho_bus_b,
 			&sc->sc_regs->pciberr_int_map, 
 			&sc->sc_regs->pciberr_clr_int);
-#if 0
 		psycho_set_intr(sc, 15, psycho_powerfail,
 			&sc->sc_regs->power_int_map, 
 			&sc->sc_regs->power_clr_int);
-#endif
 		psycho_set_intr(sc, 1, psycho_wakeup,
 			&sc->sc_regs->pwrmgt_int_map, 
 			&sc->sc_regs->pwrmgt_clr_int);
@@ -746,19 +742,18 @@
 		(long long)regs->psy_pcictl[0].pci_afsr);
 	return (1);
 }
-#if 0
 static int 
 psycho_powerfail(arg)
 	void *arg;
 {
 
 	/*
-	 * We lost power. (allegedly).
+	 * We lost power.  Try to shut down NOW.
 	 */
-	printf("Power Failure Detected, would shut down NOW without Francis' hack.\n");
+	printf("Power Failure Detected: Shutting down NOW.\n");
+	cpu_reboot(RB_POWERDOWN|RB_HALT, NULL);
 	return (1);
 }
-#endif
 static 
 int psycho_wakeup(arg)
 	void *arg;


Can anyone suggest a cleaner fix?  The board has ASM temperature and voltage
monitoring, which I think may have something to do with this.
http://www.sun.com/products-n-solutions/nep/hardware/boards/axi/docs/appnote.pdf
has more info.  I've tried a different power supply but this did not help.

Thanks,

Francis