Port-macppc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 5.0.2 can't handle 7455 CPU @ 1 GHz?



>>http://www.klos.com/~john/netbsd_noati.gz
>
>Got it.  Many thanks for the responses.
>
>I'll give these a try:
>
>1. boot without video
>
>2. boot with noati kernel
>
>3. (maybe) swap CPU back. (I'll have to figure out what I did with it. ;-> )
>
>I'll report back.
>
>-dgl-

I've tried the first few experiments, and have a few results.

First of all, with the stock kernel (see more below)
the hang only occurs when I "touch" the disk at wd0, connected
to the Promise Ultra133 card.  It doesn't matter if the video card is plugged
in or not.

I first decided to get myself a good test case.  To match the config of the
other box where I was having trouble (other disks/config, really), I put
in a second disk, connected to the Promise Ultra133 card.

After futzing with disklabel and pdisk a while (they argue!)
I landed a test case where if I did a "newfs /dev/wd0g", it would hang
reliably part way into the newfs.  I ran that a couple of times to verify the
hang.

I then pulled the video card.  (and spent some time figuring out how to get it
to actually boot without benefit of the video or serial console....)

Same hang.

Interestingly, the logs shows this when the video card is in:

Dec 20 21:05:17 charm /netbsd: spdmem3: voltage LvTTL (not 5V tolerant), 
refresh time 15.625us (self-refreshing)
Dec 20 21:05:17 charm /netbsd: uni_n0 at mainbus0 address 0xf8000000
Dec 20 21:05:17 charm /netbsd: ki2c0 at uni_n0 address 0xf8001000
Dec 20 21:05:17 charm /netbsd: iic1 at ki2c0: I2C bus
Dec 20 21:05:17 charm /netbsd: uninorth0 at mainbus0
Dec 20 21:05:17 charm /netbsd: pci0 at uninorth0 bus 0
Dec 20 21:05:17 charm /netbsd: pci0: i/o space, memory space enabled
Dec 20 21:05:17 charm /netbsd: pchb0 at pci0 dev 11 function 0
Dec 20 21:05:17 charm /netbsd: pchb0: Apple Computer UniNorth AGP Interface 
(rev. 0x00)
Dec 20 21:05:17 charm /netbsd: r128fb0 at pci0 dev 16 function 0: ATI 
Technologies Rage Fury MAXX AGP 4x (TMDS)
Dec 20 21:05:17 charm /netbsd: r128fb0: 64 MB aperture at 0x94000000
Dec 20 21:05:17 charm /netbsd: wsdisplay0 at r128fb0 kbdmux 1: console 
(default, vt100 emulation)
Dec 20 21:05:17 charm /netbsd: wsmux1: connecting to wsdisplay0
Dec 20 21:05:17 charm /netbsd: uninorth1 at mainbus0
Dec 20 21:05:17 charm /netbsd: pci1 at uninorth1 bus 0
Dec 20 21:05:17 charm /netbsd: pci1: i/o space, memory space enabled


When the video card is removed, it shows:

Dec 20 21:49:30 charm /netbsd: spdmem3: voltage LvTTL (not 5V tolerant), 
refresh time 15.625us (self-refreshing)
Dec 20 21:49:30 charm /netbsd: uni_n0 at mainbus0 address 0xf8000000
Dec 20 21:49:30 charm /netbsd: ki2c0 at uni_n0 address 0xf8001000
Dec 20 21:49:30 charm /netbsd: iic1 at ki2c0: I2C bus
Dec 20 21:49:30 charm /netbsd: uninorth0 at mainbus0
Dec 20 21:49:30 charm /netbsd: pci0 at uninorth0 bus 0
Dec 20 21:49:30 charm /netbsd: pci0: i/o space, memory space enabled
Dec 20 21:49:30 charm /netbsd: pchb0 at pci0 dev 11 function 0
Dec 20 21:49:30 charm /netbsd: pchb0: Apple Computer UniNorth AGP Interface 
(rev. 0x00)
Dec 20 21:49:30 charm /netbsd: r128fb0 at pci0 dev 16 function 0: ATI 
Technologies Rage Fury MAXX AGP 4x (TMDS)
Dec 20 21:49:30 charm /netbsd: r128fb0: no width property 
Dec 20 21:49:30 charm /netbsd: uninorth1 at mainbus0 
Dec 20 21:49:30 charm /netbsd: pci1 at uninorth1 bus 0
Dec 20 21:49:30 charm /netbsd: pci1: i/o space, memory space enabled
Dec 20 21:49:30 charm /netbsd: pchb1 at pci1 dev 11 function 0
Dec 20 21:49:30 charm /netbsd: pchb1: Apple Computer UniNorth Host-PCI Bridge 
(rev. 0x00)
Dec 20 21:49:30 charm /netbsd: ppb0 at pci1 dev 13 function 0: Digital 
Equipment DC21154 PCI-PCI Bridge (rev. 0x05)
Dec 20 21:49:30 charm /netbsd: pci2 at ppb0 bus 1
Dec 20 21:49:30 charm /netbsd: pci2: i/o space, memory space enabled


Note that the only thing missing is the two lines:

Dec 20 21:05:17 charm /netbsd: r128fb0: 64 MB aperture at 0x94000000
Dec 20 21:05:17 charm /netbsd: wsdisplay0 at r128fb0 kbdmux 1: console 
(default, vt100 emulation)


That was unexpected.

In any case, I then tried another test case, doing a newfs to an
unused partition on the disk on the motherboard ATA connector (boot disk:
/dev/rwd1f)

Solid - no hang. (caveat - see below)

Soooo.......

Whatever it is, it's does not appear to be the video card.

Note that I also see big blocks of crap in the system log (/car/log/messages).
I'm not sure if this is the kernel spitting out crap in its death throes,
or some artifact of the disk being in a bad state at panic time.
the junk (mostly NULLs) _appears_ to be _instead_ of the data that
should be there, not in addition to it.  (i.e. somebody scribbled on
cache before i got written out?)

Hard to say, though.....

Chapter 2 - Jon Klos's noati kernel.....

The no-ati kernel had some option that suppressed
log messages, so all I got in the log was:

Dec 20 22:29:43 charm /netbsd: Copyright (c) 1996, 1997, 1998, 1999, 2000, 
2001, 2002, 2003, 2004, 2005,
Dec 20 22:29:43 charm /netbsd: 2006, 2007, 2008, 2009, 2010
Dec 20 22:29:43 charm /netbsd: The NetBSD Foundation, Inc.  All rights reserved.
Dec 20 22:29:43 charm /netbsd: Copyright (c) 1982, 1986, 1989, 1991, 1993
Dec 20 22:29:43 charm /netbsd: The Regents of the University of California.  
All rights reserved.
Dec 20 22:29:43 charm /netbsd: 
Dec 20 22:29:43 charm /netbsd: NetBSD 5.1_STABLE (GENERIC_noATI) #0: Mon Dec 20 
19:16:16 UTC 2010
Dec 20 22:29:43 charm /netbsd: 
john%sage.klos.com@localhost:/usr/obj/sys/arch/macppc/compile/GENERIC_noATI
Dec 20 22:29:43 charm /netbsd: total memory = 1280 MB
Dec 20 22:29:43 charm /netbsd: avail memory = 1228 MB
Dec 20 22:29:43 charm /netbsd: timecounter: Timecounters tick every 10.000 msec
Dec 20 22:29:43 charm /netbsd: found openpic PIC at 80040000
Dec 20 22:29:43 charm /netbsd: OpenPIC Version 1.2: Supports 4 CPUs and 64 
interrupt sources.
Dec 20 22:29:43 charm /netbsd: bootpath:
Dec 20 22:29:43 charm savecore: no core dump 
Dec 20 22:29:43 charm /netbsd: Accounting started
Dec 20 22:29:44 charm ntpd[214]: ntpd 4.2.4p6-o Thu Jan  8 21:02:40 MET 2009 
(import)
Dec 20 22:29:45 charm ntpd[233]: precision = 1.084 usec  


Fortunately, dmesg came through for me:

total memory = 1280 MB
avail memory = 1228 MB
timecounter: Timecounters tick every 10.000 msec
found openpic PIC at 80040000
OpenPIC Version 1.2: Supports 4 CPUs and 64 interrupt sources.
bootpath: /pci@f2000000/@d/mac-io@7/ata-4@1f000/disk@0:3,/netbsd
mainbus0 (root)
cpu0 at mainbus0: 7455 (Revision 2.1), ID 0 (primary)
cpu0: HID0 8450c0bc<EMCP,TBEN,NAP,DPM,ICE,DCE,SGE,BTIC,LRSTK,FOLD,BHT>, 
powersave: 1
cpu0: 1000.00 MHz, 256KB L2 cache no parity parity enabled, 2MB no-parity L3 
cache (PB2 SRAM) at 4:1 ratio
memory0 at mainbus0: len=512
spdmem0 at memory0
spdmem0: SDRAM memory, no parity or ECC, 512MB, 143MHz (PC-1100)
spdmem0: 13 rows, 10 cols, 2 banks, 4 banks/chip, 7.0ns cycle time
spdmem0: tAA-tRCD-tRP-tRAS: 3-15-15-37
spdmem0: voltage LvTTL (not 5V tolerant), refresh time 7.8us (self-refreshing)
spdmem1 at memory0
spdmem1: SDRAM memory, no parity or ECC, 512MB, 143MHz (PC-1100)
spdmem1: 13 rows, 10 cols, 2 banks, 4 banks/chip, 7.0ns cycle time
spdmem1: tAA-tRCD-tRP-tRAS: 3-15-15-37
spdmem1: voltage LvTTL (not 5V tolerant), refresh time 7.8us (self-refreshing)
spdmem2 at memory0
spdmem2: SDRAM memory, no parity or ECC, 128MB, 125MHz (PC-1000)
spdmem2: 12 rows, 10 cols, 1 banks, 4 banks/chip, 8.0ns cycle time
spdmem2: tAA-tRCD-tRP-tRAS: 3-20-20-50
spdmem2: voltage LvTTL (not 5V tolerant), refresh time 15.625us 
(self-refreshing)
spdmem3 at memory0
spdmem3: SDRAM memory, no parity or ECC, 128MB, 125MHz (PC-1000)
spdmem3: 12 rows, 10 cols, 1 banks, 4 banks/chip, 8.0ns cycle time
spdmem3: tAA-tRCD-tRP-tRAS: 3-20-20-50
spdmem3: voltage LvTTL (not 5V tolerant), refresh time 15.625us 
(self-refreshing)
uni_n0 at mainbus0 address 0xf8000000
ki2c0 at uni_n0 address 0xf8001000
iic1 at ki2c0: I2C bus
uninorth0 at mainbus0
pci0 at uninorth0 bus 0
pci0: i/o space, memory space enabled
pchb0 at pci0 dev 11 function 0
pchb0: Apple Computer UniNorth AGP Interface (rev. 0x00)
genfb0 at pci0 dev 16 function 0: ATI Technologies Rage Fury MAXX AGP 4x (TMDS)
genfb0: framebuffer at 0x94008000, size 640x480, depth 8, stride 768
wsdisplay0 at genfb0 kbdmux 1: console (default, vt100 emulation)
wsmux1: connecting to wsdisplay0
drm at genfb0 not configured
uninorth1 at mainbus0
pci1 at uninorth1 bus 0
pci1: i/o space, memory space enabled

HOWEVER... when I ran my test case on "newfs /dev/rwd1f", (boot disk) it hung!

I ran the newfs on the boot disk again and it hung again.  In fact,
I found that the noati kernel is not reliable to boot.  I had to try several
times on each reboot, because it hung partway through.

When it finally came all the way back up, I decided to try something different.
I remember that I had a strange problem like this in 2005, and "fixed" it
by running a process that kept the CPU out of idle.  Just for grins, I fired
up a shell loop "while true ; do : ; done &" and ran my newfs on
the boot disk - 3 times success.

I then ran the newfs on the PCI-attached disk - 3 times success!

I killed the CPU burner, expecting a failure.

newfs on boot disk - 3 times success.

newfs on PCI-attached disk - 3 times success.

Argh.....

Reboot the noati kernel.....

newfs on boot disk - 2 times success (sigh)

newfs on pci-attach disk - HANG

reboot the noati kernel .....

newfs -> pci-attach disk - HANG

reboot the noati kernel

once up - start the CPU burner first thing.

newfs on PCI-attach disk - 3 times success

newfs on boot disk - 3 times success

Kill CPu burner

newfs on boot disk - HANG


------

OK.  Enough experiments for one night.

This is mighty strange.  The CPU-burner is definitely a factor, though.

Can I guess that this is another interrupt problem?  Is the CPU
vulnerable to some bug if an interrupt comes in when idle, vs when
busy (and has to be pulled into the kernel by an interrupt?)

I've not yet swapped in the old CPU (nor found it yet)

Interesting, huh?

-dgl-


Home | Main Index | Thread Index | Old Index