Subject: IP20 unsuccessful install
To: None <port-sgimips@netbsd.org>
From: Havard Eidnes <he@netbsd.org>
List: port-sgimips
Date: 03/28/2004 23:01:30
----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

we recently physically brought back an old IP20 machine with the
intent of installing/testing NetBSD on it.  After having tried my
own cross-compiled install kernel 3 or 4 times, I've concluded
that there is something fishy going on, possibly triggered by our
hardware.

Typically, we walk through most of sysinst, tell it to partition
the disk and fetch the install sets, but when it comes time to
unpacking the sets, it invariably drops to DDB after a while:

 61% |**********************               |  4064 KB   61.56 KB/s  - s=
talled -
Stopped at      0x882e10c8:     lw      v0,8(s3)
db> =


Inspection of the registers show that s3 is 0, so this appears to
be a null pointer de-reference in the kernel.

The decoded stack backtrace (had to use the symbols file) is:

db> trace
wd33c93_abort+64 (1,bfb8011f,1,17) ra 882e3b3c sz 40
wd33c93_timeout+104 (1,bfb8011f,1,17) ra 881fa68c sz 72
softclock+2f8 (1,bfb8011f,1,17) ra 881d4314 sz 32
hardclock+258 (1,bfb8011f,1,17) ra 882b1774 sz 32
mips3_clock_intr+b0 (1,bfb8011f,1,17) ra 882aeb70 sz 48
cpu_intr+78 (1,bfb8011f,1,17) ra 8828ac50 sz 40
mips3_KernIntr+84 (8890f480,0,c3018000,0) ra 88069178 sz 128
cpu_switch+68 (8890f480,0,c3018000,0) ra 881f4514 sz 24
mi_switch+224 (8890f480,0,c3018000,0) ra 881f3af8 sz 56
ltsleep+258 (8890f480,0,c3018000,0) ra 88209814 sz 56
882095b8+25c (8890f480,0,c3018000,0) ra 88207770 sz 64
dofileread+b0 (88906cc0,0,c3018000,1005cbf0) ra 8820769c sz 96
sys_read+8c (88906cc0,0,c3018000,1005cbf0) ra 882918a4 sz 56
syscall_plain+1ec (88906cc0,0,c3018000,1005cbf0) ra 8828aa9c sz 80
mips3_SystemCall+b4 (88906cc0,0,c3018000,1005cbf0) ra 553420 sz 0
PC 0x553420: not in kernel space
0+553420 (88906cc0,0,c3018000,1005cbf0) ra 0 sz 0
User-level: curlwp NULL
db> =


As can be seen from the "stalled" message above, it's been doing
approximately nothing for a while before this problem strikes.

We were slightly uncertain how the unit selector connector should
be installed on our 1.2GB <IBM OEM, 0663E15, eSfS> drive, because
the disk responds on all targets...  However, all the other disk
writing done up to that point from within sysinst has apparently
worked OK.

The dmesg is attached below.

To my eyes (looking at objdump & source of wd33c93_abort()), that
this happens somewhere in

        scsipi_printaddr(acb->xs->xs_periph);

near the top of the function, and it appears that it's acb that's
NULL; it's the

    12f8:       8e620008        lw      v0,8(s3)

instruction it stops at.  Below is disassembly of the first part
of wd33c93_abort(), as well as "show reg" output from DDB.

Now, as to why it's decided that it needed to abort the I/O I
have no idea, and why it didn't get any acb, I also don't know...

Hints for further debugging gratefully accepted.  I think our
next move will be to try another drive...

Regards,

- H=E5vard

----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

00001294 <wd33c93_abort>:
    1294:       27bdffd8        addiu   sp,sp,-40
    1298:       afbf0024        sw      ra,36(sp)
    129c:       afb40020        sw      s4,32(sp)
    12a0:       afb3001c        sw      s3,28(sp)
    12a4:       afb20018        sw      s2,24(sp)
    12a8:       afb10014        sw      s1,20(sp)
    12ac:       afb00010        sw      s0,16(sp)
    12b0:       00808821        move    s1,a0
    12b4:       00a09821        move    s3,a1
    12b8:       8c840124        lw      a0,292(a0)
    12bc:       8e250128        lw      a1,296(s1)
    12c0:       00c0a021        move    s4,a2
    12c4:       0c000000        jal     0 <wd33c93_attach>
                        12c4: R_MIPS_26 bus_space_read_1
    12c8:       00003021        move    a2,zero
    12cc:       8e240124        lw      a0,292(s1)
    12d0:       8e250128        lw      a1,296(s1)
    12d4:       24070017        li      a3,23
    12d8:       00003021        move    a2,zero
    12dc:       0c000000        jal     0 <wd33c93_attach>
                        12dc: R_MIPS_26 bus_space_write_1
    12e0:       00409021        move    s2,v0
    12e4:       8e250128        lw      a1,296(s1)
    12e8:       8e240124        lw      a0,292(s1)
    12ec:       0c000000        jal     0 <wd33c93_attach>
                        12ec: R_MIPS_26 bus_space_read_1
    12f0:       24060001        li      a2,1
    12f4:       00408021        move    s0,v0
    12f8:       8e620008        lw      v0,8(s3)
    12fc:       00000000        nop
    1300:       8c440030        lw      a0,48(v0)
    1304:       00000000        nop
    1308:       8c820004        lw      v0,4(a0)
    130c:       00000000        nop
    1310:       8c420000        lw      v0,0(v0)
    1314:       00000000        nop
    1318:       8c42000c        lw      v0,12(v0)
    131c:       00000000        nop
    1320:       0040f809        jalr    v0
    1324:       00000000        nop
    1328:       3c040000        lui     a0,0x0
                        1328: R_MIPS_HI16       .rodata
    132c:       24840198        addiu   a0,a0,408
                        132c: R_MIPS_LO16       .rodata
    1330:       02003021        move    a2,s0
    1334:       02802821        move    a1,s4
    1338:       0c000000        jal     0 <wd33c93_attach>
                        1338: R_MIPS_26 printf

----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

db> show reg
at          0x88320004
v0          0x16
v1          0xf800
a0          0x1
a1          0xbfb8011f
a2          0x1
a3          0x17
t0          0xc0046034
t1          0x882916b8
t2          0xffffffff
t3          0x880690ac
t4          0
t5          0
t6          0
t7          0
s0          0x16
s1          0xc0046000
s2          0
s3          0
s4          0x8832a798
s5          0
s6          0
s7          0x8830d790
t8          0
t9          0x5dcc00
k0          0
k1          0
gp          0x8863f1b0
sp          0xc3019c10
fp          0x1
ra          0x882e10c4
sr          0xf802
mdlo        0x5c26ce98
mdhi        0xbfb
bad         0
cs          0
pc          0x882e10c8
0x882e10c8:     lw      v0,8(s3)
db> 
----Next_Part(Sun_Mar_28_23:01:30_2004_827)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.6ZK (INSTALL32_IP2x) #18: Wed Mar 24 22:17:13 CET 2004
        he@stegg.urc.uninett.no:/usr/users/he/src/sys/arch/sgimips/compile/obj.sgimips/INSTALL32_IP2x
49152 KB memory, 40872 KB free, 768 KB for ARCS
mainbus0 (root): SGI-IP20 [SGI, 6906a2c8], 1 processor
cpu0 at mainbus0: MIPS R4000 CPU (0x422) Rev. 2.2 with MIPS R4010 FPC Rev. 0.0
cpu0: 8KB/16B direct-mapped L1 Instruction cache, 48 TLB entries
cpu0: 8KB/16B direct-mapped write-back L1 Data cache
cpu0: 1024KB/128B direct-mapped write-back L2 Unified cache
int0 at mainbus0 addr 0x1fb801c0: bus 50MHz, CPU 100MHz
imc0 at mainbus0 addr 0x1fa00000: revision 1
gio0 at imc0
unknown GIO card (product 0x7f revision 0xff) at gio0 slot 2 addr 0x1f000000 not configured
hpc0 at gio0 addr 0x1fb80000: SGI HPC1.5
zsc0 at hpc0 offset 0xd10
zstty0 at zsc0 channel 1 (console i/o)
zstty1 at zsc0 channel 0
zsc1 at hpc0 offset 0xd00
zsc1:  channel 1 not configured
zsc1:  channel 0 not configured
int0: cannot share interrupts yet.
sq0 at hpc0 offset 0x100: SGI Seeq 80c03
sq0: Ethernet address 08:00:69:06:a2:c8
wdsc0 at hpc0 offset 0x11f: WD33C93B SCSI, rev=0, target 0
scsibus0 at wdsc0: 8 targets, 8 luns per target
dpclock0 at hpc0 offset 0xe00
biomask 07 netmask 07 ttymask 0f clockmask bf
md0: internal 3072 KB image area
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 1 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd0: drive offline
sd0: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 2 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd1: drive offline
sd1: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd2 at scsibus0 target 3 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd2: drive offline
sd2: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd3 at scsibus0 target 4 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd3: drive offline
sd3: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd4 at scsibus0 target 5 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd4: drive offline
sd4: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd5 at scsibus0 target 6 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd5: drive offline
sd5: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
sd6 at scsibus0 target 7 lun 0: <IBM OEM, 0663E15, eSfS> disk fixed
sd6: drive offline
sd6: sync (200.00ns offset 12), 8-bit (5.000MB/s) transfers, tagged queueing
boot device: sd0
root on md0a dumps on md0b
WARNING: clock gained 3 days -- CHECK AND RESET THE DATE!
root file system type: ffs

----Next_Part(Sun_Mar_28_23:01:30_2004_827)----