Subject: Re: IP20 unsuccessful install
To: None <wileyc@rezrov.net>
From: Havard Eidnes <he@netbsd.org>
List: port-sgimips
Date: 03/30/2004 16:23:33
----Next_Part(Tue_Mar_30_16:23:33_2004_366)--
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

me again...  For some reason or other we could not get the SGI to
succussfully complete it's selftests when the IBM drive was
plugged in.  We therefore ended up with a 4GB Seagate Barracuda
drive on target 2, and the firmware seemed to be happy with that.

Booted the sysinst kernel over the net, and restarted the
installation.  However, the kernel still dies with a null pointer
de-reference in:

150 Opening BINARY mode data connection for 'comp.tgz' (22052130 bytes)=
.=

 16% |*****                                |  3625 KB  905.82 KB/s    0=
0:19 ETA
Stopped at      0x882e10c8:     lw      v0,8(s3)
db> trace
882e1064+64 (1,bfb8011f,1,17) ra 882e3b3c sz 40
882e3a38+104 (1,bfb8011f,1,17) ra 881fa68c sz 72
881fa394+2f8 (1,bfb8011f,1,17) ra 881d4314 sz 32
881d40bc+258 (1,bfb8011f,1,17) ra 882b1774 sz 32
882b16c4+b0 (1,bfb8011f,1,17) ra 882aeb70 sz 48
882aeaf8+78 (1,bfb8011f,1,17) ra 8828ac50 sz 40
mips3_KernIntr+84 (8890f3f0,0,c301c000,0) ra 88069178 sz 128
cpu_switch+68 (8890f3f0,0,c301c000,0) ra 881f4514 sz 24
881f42f0+224 (8890f3f0,0,c301c000,0) ra 881f3af8 sz 56
881f38a0+258 (8890f3f0,0,c301c000,0) ra 882087e4 sz 56
88208514+2d0 (8890f3f0,0,c301df68,0) ra 882918a4 sz 264
882916b8+1ec (8890f3f0,0,c301df68,0) ra 8828aa9c sz 80
mips3_SystemCall+b4 (8890f3f0,0,c301df68,0) ra 5533e0 sz 0
PC 0x5533e0: not in kernel space
0+5533e0 (8890f3f0,0,c301df68,0) ra 0 sz 0
User-level: curlwp NULL
db> show reg
...
s3          0
...
0x882e10c8:     lw      v0,8(s3)
db> =


which I suspect is still the abort() routine in the HBA driver.
...yep, 0x882e1064 is wd33c93_abort() with the next one in line
being wd33c93_timeout().  Is there any chance that this part of
the code could be made more robust against passing of "strange"
arguments, or is this something which is supposed to absolutely
never happen?

Dmesg from net-booting the install kernel is attached below.

Hmm, looking at /kern/msgbuf after rebooting the install kernel
(managed to do that without powering off), I observe the
following extra stuff before the autoconf output for the present
run:

WARNING: clock gained 5 days -- CHECK AND RESET THE DATE!
root file system type: ffs
sd0: no disk label
sd0: no disk label
sd0: no disk label
sd0: no disk label
sd0: no disk label
sq0: Unexpected interrupt!
sq0: Unexpected interrupt!
sq0: Unexpected interrupt!
sq0: Unexpected interrupt!
sq0: Unexpected interrupt!
sd0(wdsc0:0:2:0): wdsc0: timed out; asr=3D0x00 [acb 0x889157b0 (flags 0=
x11, dleft 0)], <state 1, nexus 0x0, resid 10000, msg(q 0,o 100)>trap: =
TLB miss (load or instr. fetch) in kernel mode
status=3D0xf802, cause=3D0x30000008, epc=3D0x882e10c8, vaddr=3D0x8
curlwp =3D=3D NULL ksp=3D0xc301db60
sd0: cache synchronization failed
rebooting...

so it has at least had one sd0 time-out before, but I'm uncertain
as to whether the trap is related to the printed-out time-out.


Regards,

- H=E5vard

----Next_Part(Tue_Mar_30_16:23:33_2004_366)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

[ Kernel symbol table missing! ]
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.6ZK (INSTALL32_IP2x) #18: Wed Mar 24 22:17:13 CET 2004
        he@stegg.urc.uninett.no:/usr/users/he/src/sys/arch/sgimips/compile/obj.sgimips/INSTALL32_IP2x
49152 KB memory, 40872 KB free, 768 KB for ARCS
mainbus0 (root): SGI-IP20 [SGI, 6906a2c8], 1 processor
cpu0 at mainbus0: MIPS R4000 CPU (0x422) Rev. 2.2 with MIPS R4010 FPC Rev. 0.0
cpu0: 8KB/16B direct-mapped L1 Instruction cache, 48 TLB entries
cpu0: 8KB/16B direct-mapped write-back L1 Data cache
cpu0: 1024KB/128B direct-mapped write-back L2 Unified cache
int0 at mainbus0 addr 0x1fb801c0: bus 50MHz, CPU 100MHz
imc0 at mainbus0 addr 0x1fa00000: revision 1
gio0 at imc0
unknown GIO card (product 0x7f revision 0xff) at gio0 slot 2 addr 0x1f000000 not configured
hpc0 at gio0 addr 0x1fb80000: SGI HPC1.5
zsc0 at hpc0 offset 0xd10
zstty0 at zsc0 channel 1 (console i/o)
zstty1 at zsc0 channel 0
zsc1 at hpc0 offset 0xd00
zsc1:  channel 1 not configured
zsc1:  channel 0 not configured
int0: cannot share interrupts yet.
sq0 at hpc0 offset 0x100: SGI Seeq 80c03
sq0: Ethernet address 08:00:69:06:a2:c8
wdsc0 at hpc0 offset 0x11f: WD33C93B SCSI, rev=0, target 0
scsibus0 at wdsc0: 8 targets, 8 luns per target
dpclock0 at hpc0 offset 0xe00
biomask 07 netmask 07 ttymask 0f clockmask bf
md0: internal 3072 KB image area
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 2 lun 0: <SEAGATE, SX15150N, 9611> disk fixed
sd0: drive offline
sd0: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
boot device: sq0
root on md0a dumps on md0b
WARNING: clock gained 5 days -- CHECK AND RESET THE DATE!
root file system type: ffs
tset: terminal type :?vt100 is unknown
Terminal type? 

----Next_Part(Tue_Mar_30_16:23:33_2004_366)----