Subject: Re: MP?
To: Havard Eidnes <he@netbsd.org>
From: Johnny Billquist <bqt@update.uu.se>
List: port-alpha
Date: 01/22/2004 04:22:01
On Tue, 20 Jan 2004, Havard Eidnes wrote:

> > I could probably also hook up the console so that it is
> > accessible from the net if really needed.
>
> A logging console is always good. ;-)
>
> I'm not sure I'll be able to offer much in terms of assistance in
> tracking down the problem you are seeing, but to be able to at
> least start looking at it, it would definately help to get an
> exact printout of what you observe on the console.

Well, here is what happens.
Sorry it took so long. One CPU card croaked last night. I had to replace,
and unfortunately I no longer have two identical CPU cards around, so now
the system have three different CPUs instead.

The crash is identical, however...

That's the story so far, and this is what it says on the console:

-----


VMS PALcode V5.56-7, OSF PALcode V1.45-12

starting console on CPU 0
Testing Memory bank 0
Testing Memory bank 1
Testing Memory bank 2
Testing Memory bank 3
Configuring Memory Modules
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
bus 0, slot 0 -- ewa -- DECchip 21040-AA
bus 0, slot 1 -- pka -- NCR 53C810
bus 1, slot 2 -- vga -- Compaq Qvision

** keyboard error **
starting console on CPU 1
starting console on CPU 2
os_type: UNIX - console CIPCA driver not started
Memory Testing and Configuration Status
Module   Size    Base Addr   Intlv Mode  Intlv Unit  Status
------   -----   ---------   ----------  ----------  ------
  0      512MB   00000000      1-Way         0       Passed
  1      128MB   20000000      2-Way         0       Passed
  2      128MB   20000000      2-Way         1       Passed
Total Bad Pages 0
Testing the System
Testing the Disks (read only)
Testing the Network
AlphaServer 2100 Console V5.3-2, built on Oct 16 1998 at 11:32:56

CPU 0 booting

(boot dka400.4.0.1.0 -flags A)
block 0 of dka400.4.0.1.0 is a valid boot block
reading 15 blocks from dka400.4.0.1.0
P00>>>^C
P00>>>sho conf
                        Digital Equipment Corporation
                           AlphaServer 2100 4/275

SRM Console V5.3-2              VMS PALcode V5.56-7, OSF PALcode V1.45-12

Component       Status          Module ID
CPU 0              P            B2024-AA DECchip (tm) 21064A-2
CPU 1              P            B2020-BA DECchip (tm) 21064A-2
CPU 2              P            B2020-AA DECchip (tm) 21064-3
Memory 0           P            B2022-CA 512 MB
Memory 1           P            B2021-CA 128 MB
Memory 2           P            B2021-CA 128 MB
I/O                             B2110-AA
                                dva0.0.0.1000.0         RX26/RX23

 Slot   Option                  Hose 0, Bus 0, PCI
   0    DECchip 21040-AA        ewa0.0.0.0.0            08-00-2B-E2-58-6B
   1    NCR 53C810              pka0.7.0.1.0            SCSI Bus ID 7
                                dka0.0.0.1.0            SEAGATE ST19171N
                                dka200.2.0.1.0          RZ28
                                dka400.4.0.1.0          RZ29B
                                dka600.6.0.1.0          TOSHIBA XM-4101TASUNSLCD
   2    Intel 82375                                     Bridge to Bus 1, EISA

 Slot   Option                  Hose 0, Bus 1, EISA
   2    Compaq Qvision
P00>>>boot
(boot dka400.4.0.1.0 -flags A)
block 0 of dka400.4.0.1.0 is a valid boot block
reading 15 blocks from dka400.4.0.1.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 1e00
initializing HWRPB at 2000
initializing page table at 2fff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code

NetBSD/alpha 1.6_STABLE FFS Primary Bootstrap
Jumping to entry point...

NetBSD/alpha 1.6_STABLE Secondary Bootstrap, Revision 1.13
(bouyer@java, Fri Nov 22 12:25:50 MET 2002)

VMS PAL rev: 0x4000700010538
OSF PAL rev: 0x4000c0002012d
Switch to OSF PAL code succeeded.

Boot flags: A
6779248+445480 [402648+255977]=0x784ee0

Entering netbsd at 0xfffffc0000301460...
WARNING: Setting hz to 512. hwrpb claims it is 1024
consinit: not using prom console
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.6ZH (GENERIC.MP) #1: Mon Jan 19 03:38:19 CET 2004
        root@Pony.BQTnet.SE:/usr/obj/sys/arch/alpha/compile/GENERIC.MP
AlphaServer 2100 4/275, 274MHz, s/n ay42522292
8192 byte page size, 3 processors.
total memory = 768 MB
(2120 KB reserved for PROM, 765 MB used by NetBSD)
avail memory = 744 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21064A-2
cpu1 at mainbus0: ID 1, 21064A-2
cpu1: processor failed to hatch
cpu2 at mainbus0: ID 2, 21064-1
cpu2: processor failed to hatch
ttwoga0 at mainbus0
ttwopci0 at ttwoga0 hose 0: T2 Gate Array rev. 1
sableio0 at ttwopci0 bus 0: Sable STDIO module
pckbc0 at sableio0 port 0x60
fdc1 at sableio0 port 0x3f0
fdc1: interrupting at T2 irq 7
com0 at sableio0 port 0x3f8: ns16550a, working fifo
com0: console
com0: interrupting at T2 irq 15
com1 at sableio0 port 0x2f8: ns16550a, working fifo
com1: interrupting at T2 irq 8
lpt0 at sableio0 port 0x3bc
lpt0: interrupting at T2 irq 9
pci0 at ttwopci0 bus 0
pci0: i/o space, memory space enabled
tlp0 at pci0 dev 0 function 0: DECchip 21040 Ethernet, pass 2.2
tlp0: interrupting at T2 irq 2
tlp0: Ethernet address 08:00:2b:e2:58:6b
tlp0: 10baseT, 10baseT-FDX, 10base5, manual
siop0 at pci0 dev 1 function 0: Symbios Logic 53c810 (fast scsi)
siop0: interrupting at T2 irq 1
scsibus0 at siop0: 8 targets, 8 luns per target
pceb0 at pci0 dev 2 function 0: Intel 82375EB/SB PCI-EISA Bridge (PCEB) (rev. 0x03)
eisa0 at pceb0
unknown Compaq device CPQ3011 at eisa0 slot 2 not configured
isa0 at pceb0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
spkr0 at pcppi0
isabeep0 at pcppi0
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
fd0 at fdc1 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST19171N, 0024> disk fixed
sd0: 8683 MB, 5268 cyl, 20 head, 168 sec, 512 bytes/sect x 17783112 sectors
sd0: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 2 lun 0: <DEC, RZ28     (C) DEC, 441C> disk fixed
sd1(siop0:0:2:0):  Check Condition on CDB: 0x00 00 00 00 00 00
    SENSE KEY:  Not Ready
     ASC/ASCQ:  Logical Unit Not Ready, Initialization Command Required

sd1: drive offline
sd1: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers, tagged queueing
sd2 at scsibus0 target 4 lun 0: <DEC, RZ29B    (C) DEC, 0014> disk fixed
sd2: 4091 MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sd2: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers, tagged queueing
cd0 at scsibus0 target 6 lun 0: <TOSHIBA, XM-4101TASUNSLCD, 3424> cdrom removable
cd0: async, 8-bit transfers
sd0: no disk label
sd1(siop0:0:2:0):  Check Condition on CDB: 0x00 00 00 00 00 00
    SENSE KEY:  Not Ready
     ASC/ASCQ:  Logical Unit Not Ready, Initialization Command Required

sd1: no disk label
root on sd2a dumps on sd2b
WARNING: possible botched superblock upgrade detected
on filesystem previously mounted on /
fs_bsize == fs_maxbsize (0x00002000) but FS_FLAGS_UPDATED is not set
Test your filesystem by running fsck_ffs -n -f on it.
If it reports:
``VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE''
you should be able to recover with fsck_ffs -b 32 -c 4
See the file src/UPDATING or
http://mail-index.netbsd.org/current-users/2004/01/11/0022.html
for more details
WARNING: clock gained 3 days -- CHECK AND RESET THE DATE!
root file system type: ffs
panic: kernel diagnostic assertion "p != NULL" failed: file "/usr/src/sys/kern/kern_synch.c", line 413
Stopped at      netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
db{1}>

----

Hmm, two comments. I didn't notice the processor filed to hatch before,
but it might have been there. I never looked that much so far back. I
don't know if those lines might be relevant in this case.

Second, the thing about hz being set to 512, that's a hack I had to do,
since the clock actually interrupts at 512 Hz, and not 1024, which, for
some reason NetBSD thinks the system claims.
So it's just a piece of local code that prints something others might not
recognize.

	Johnny

Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt@update.uu.se           ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol