Subject: Problems bringing 2.0C up on Challenge S system
To: None <port-sgimips@netbsd.org>
From: Havard Eidnes <he@netbsd.org>
List: port-sgimips
Date: 04/19/2004 18:18:36
Hi,

after working around

  port-sgimips/25202: boot failure on Challenge S system

by commenting out haltwo0 from the configuration file, I'm now
stuck with another problem while trying to bring up this system.

The problem is that as soon as I make the initial login on the
console, the machine panics, like so:

NetBSD/sgimips (minnesota.urc.uninett.no) (console)

login: root
Apr 19 18:02:12 minnesota login: Cannot update lastlogx Inappropriate f=
ile type or format
Apr 19 18:02:12 minnesota login: Cannot update lastlogx Inappropriate f=
ile type or format
Apr 19 18:02:12 minnesota login: ROOT LOGIN (root) ON console
Apr 19 18:02:12 minnesota login: ROOT LOGIN (root) ON console
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 2.0C (GENERIC32_IP2x) #24: Fpanic: cache error @ EPC 0x882b4098 =
ErrCtl 0x0 CacheErr 0xa0102c43
panic: cache error @ EPC 0x882b40ac ErrCtl 0x0 CacheErr 0xa03b153b
Stopped in pid 277.1 (csh) at   0x882b0428:panic: cache error @ EPC 0x8=
8221ce0 ErrCtl 0x0 CacheErr 0xa033bec1
     Stopped in pid 277.1 (csh) at   0
x882b0428:      jr      rapanic: cache error @ EPC 0x88221ce0 ErrCtl 0x=
0 CacheErr 0xa033bed7
Stopped in pid 277.1 (csh) at 0x882b0428:     jr      r
a
                bdslot:panic: cache error @ EPC 0x88221ce0 ErrCtl 0x0 C=
acheErr 0xa033bf0f
 Stopped in pid 277.1 (csh) at   0x882b0428:     jr      r
a
                bdslot: nop
db> =


This is eminently repeatable; apparently no progress can be made
at this point.  Also, any attempt at doing anything in DDB just
causes this same panic.

I suspect that the error is not "real"; the machine can do many
other things without any problems, such as fsck the 4GB system
disk and running the rc system to completion.

I seem to recall some comments from earlier that there was a
certain "brittleness" to console serial I/O on the sgimips port,
when I earlier commented sporadic occurrances of this problem on
my Indigo2 system.  Now that I check, my 1.6ZL-running Indigo2
can be crashed the exact same way -- try to log in as root on the
console, and "boom":

NetBSD/sgimips (viola.urc.uninett.no) (console)

login: root
Last login: Tue Feb 17 10:38:39 2004 on console
Apr 19 18:12:18 viola login: ROOT LOGIN (root) ON console
Apr 19 18:12:18 viola login: ROOT LOGIN (root) ON console
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 20panic: cache error =
@ EPC 0x88131914 ErrCtl 0x0 CacheErr 0xa0033ce9
panic: cache error @ EPC 0x88131928 ErrCtl 0x0 CacheErr 0xa01d2089
Stopped in pid 10254.1 (csh) at 0x8812d054:     jr      ra
                bdslot: nop
panic: cache error @ EPC 0x880c1e04 ErrCtl 0x0 CacheErr 0xa0171f88
Stopped in pid 10254.1 (csh) at 0x8812d054:     jr      ra
                bdslot: nop
db>panic: cache error @ EPC 0x880c1e5c ErrCtl 0x0 CacheErr 0xa0171f93
 Stopped in pid 10254.1 (csh) at     0x8812d054:     jr      ra
                bdslot: nop
db> panic: cache error @ EPC 0x8807a8bc ErrCtl 0x0 CacheErr 0xa01a9fe1
Stopped in pid 10254.1 (csh) at     0x8812d054:     jr      ra
                bdslot: nop
db> panic: cache error @ EPC 0x8807a8d8 ErrCtl 0x0 CacheErr 0xa01a9ff9
Stopped in pid 10254.1 (csh) at     0x8812d054:     jr      ra
                bdslot: nop
db> =


Does anyone have any idea what might be causing this?  The kernel
I tried on the Challenge S server was compiled from sources from
last Friday, the time window for 1.6ZL was pretty narrow (I don't
recall offhand; it was right before 2.0 was branched).

BTW, this looks pretty much like a show-stopper bug for 2.0...

Regards,

- H=E5vard