Subject: Re: Race condition in kernel! (was ping times)
To: Donald Lee <MacPPC@caution.icompute.com>
From: John Klos <john@ziaspace.com>
List: port-macppc
Date: 07/17/2004 19:38:27
Hi,

> I just tried booting a CD on a Quicksilver G4/867.  It has the same problem.
> (10 ms quantum ping times to local network)

This is a very interesting problem; I remember seeing this a while ago, 
but I forget where, when, and why. It went away after I switched ethernet 
cards or something. I wish I could remember...

> Could anyone else try a ping from a *fast* MacPPC and see if the ping
> times are all multiples of 10 ms?  I've now seen the problem on two
> machines - an 867 Mhz Quicksilver, and a G4/AGP with a
> 1 Ggz CPU, (both are Moto 7455 CPUs).

I haven't seen this on either a 1.2 GHz 7457 or a 1.3 GHz 7455 on a 100 
MHz bus G4 system. I was running 2.0 beta.

> This Quicksilver does not have the L3 cache, so that tends to eliminate
> the L3 cache theory.  I'm now convinced it's a race - probably
> somewhere in the context switch machinery in the kernel.

Actually, it does have L3:
http://www.lowendmac.com/ppc/quicksilver.html
It's only the 733 MHz which didn't have L3. Silly Apple.

> Anyone have suggestions?  I'm motivated to dig in and puzzle this
> out, but it will be a hard slog for me unless I get a little guidance.

Could you once again go over the specifics of your machines? Standard 
hardware, except for the accelerator? Using motherboard ethernet and IDE? 
Running which NetBSD? (Sorry to ask you to go over it again, but it might 
be good to have it all in one place.)

> Another symptom that I had not connected to this one was that
> my dumps (dump(8)) on the 1 Ghz CPU were *really* slow.  I found that
> running a CPU intensive program at the same time the dumps were active
> actually made the dumps run about 15 times faster (yes, 15x).  I looked
> at the dump source, and dump forks off several copies of itself and
> "passes the torch" between the pids to do the I/O.  It looked to me
> at the time like the synchronization in dump was busted, but that behavior
> could also be explained by the kernel dropping events.

Sounds VERY suspicious.

> The other thing that comes to mind is the problems with the ATA
> card.  That also drops interrupts, which could be explained by
> this sort of race.

Yes, all of the IDE cards on macppc which I've been able to try have 
problems. I've tested four different PCI chips so far.

> It would be really nice if this were a long standing bug causing
> a bunch of obscure problems.

Yeah - something is not quite right somewhere, and it'd be good to get it 
fixed before 2.0 comes out.

> Ideas anyone?  How does one get a "trace" of events in a NetBSD kernel
> with extremely fine time granularity so I can see the sequence of
> events through context switches?

I'm interested in learning more about this, too. Matt?

John