Subject: 1 Ghz CPU in AGP G4 causes NetBSD 1.6.2 hangs
To: None <port-macppc@netbsd.org>
From: Donald Lee <MacPPC@caution.icompute.com>
List: port-macppc
Date: 09/26/2004 20:03:41
I acquired a "playtoy" G4/AGP so that I can have a machine to install
my 1 Ghz G4 CPU and see if I can fix the "lost interrupts" in the kernel.

The following is a slightly abbreviated history of events.  I am
convinced that we have a kernel problem with fast CPUs, and I am
quite surprised that this does not show up on the faster CPUs in
general.

I installed NetBSD 1.6.2, stock on the G4 (with stock 450 Mhz CPU)
I also installed Mac OS X 10.3 on an alternate disk.

I then installed the 1Ghz CPU. (256K L2, 2 MB L3 cache, 7455 CPU)

At first, the machine would not stay up long enough to do anything
useful.  It seemed to boot OK, and then as soon as I tried to use the network
(ftp) to fetch files, it would hang up.  (USB keyboard would lock up,
and external pings of the machine would fail.)

I switched back to the 450 Mhz CPU and  Mac OS X to satisfy myself that the
HW was OK.  I also checked out Mac OS X with the 1 Ghz CPU.  All worked
fine.

Going back to NetBSD, I got the hangs again.  Each time I booted, the
first thing I would do is try to do an ftp to another
machine on my net and fetch my ".profile", ".rhosts", ".forward", etc.
Noticing that I was getting ugly errors from my logins, I managed to look
in the files I had fetched, and they were filled with trash.
The files were the right length, but were full of non-ascii data.

At this point, I figured that the gem driver was busted, so I changed the
boot to not configure networking, and to my surprise, it stayed up.
Concluding that the gem driver was the culprit, I grabbed a 3Com
10/100 card and plugged it in.

On the next boot it hung when I tried to ftp a large file.

The last thing I tried was to run the following complex program
in the background:

	$ cat > loop.c
	main() { while(1); }
	$ cc -o loop loop.c
	$ loop &

With this running, the machine is stable, ftp works, and ping times
are in the 300 usec range.  When I kill "loop", the ping times
are 10 ms, and any ftp of any size hangs the machine.

This is definitely a problem, and I would love to have some help tracking
it down.  I do not believe that it is peculiar to this CPU, nor to
"fast" CPUs.  It looks to me like a race in the kernel, and I bet
it is the cause of other (ahem) anomalies.

My newest pet theory is that there is a bug in the interrupt
handling that only shows up if the CPU is in the idle loop
when it hits.  When loop is running, the system does not take interrupts
while in the idle loop, because it's never _in_ the idle loop.

Suggestions? Sympathy?

-dgl-



excerpt from /var/log/messages....

Sep 26 14:53:31 temp /netbsd: total memory = 512 MB
Sep 26 14:53:31 temp /netbsd: avail memory = 461 MB
Sep 26 14:53:31 temp /netbsd: using 2048 buffers containing 26316 KB of memory
Sep 26 14:53:31 temp /netbsd: mainbus0 (root)
Sep 26 14:53:31 temp /netbsd: cpu0 at mainbus0: 7455 (Revision 2.1), ID 0 (prima
ry) 
Sep 26 14:53:31 temp /netbsd: cpu0: HID0 8450c0bc<EMCP,TBEN,NAP,DPM,ICE,DCE,SGE,
BTIC,LRSTK,FOLD,BHT>
Sep 26 14:53:31 temp /netbsd: cpu0: 1000.00 MHz 
Sep 26 14:53:31 temp /netbsd: cpu0: 256KB L2 cache, 2MB L3 backside cache
Sep 26 14:53:31 temp /netbsd: uninorth0 at mainbus0
Sep 26 14:53:31 temp /netbsd: pci0 at uninorth0 bus 0 
Sep 26 14:53:31 temp /netbsd: pci0: i/o space, memory space enabled
Sep 26 14:53:31 temp /netbsd: pchb0 at pci0 dev 11 function 0
Sep 26 14:53:31 temp /netbsd: pchb0: Apple Computer UniNorth AGP Interface (rev.