port-macppc: Re: What's wrong with with the G4's GMAC interface?

Subject: Re: What's wrong with with the G4's GMAC interface?
To: Bruce Korb <bkorb@allegronetworks.com>
From: Charles M. Hannum <abuse@spamalicious.com>
List: port-macppc
Date: 04/29/2001 23:18:31
FWIW, I think cross-posting this so widely was silly, given that you
don't really have a good handle on what the problem is.

> When NetBSD is running on the Apple G4 PowerMac and the GM0
> interface is under heavy load, the system will appear to stop
> dead in its tracks.  Its behavior actually becomes bursty
> and packets are seen out of order.  This was first reported
> on the net as early as March, 2000, some 14 months ago.

My first thought on reading `stop dead in its tracks' is that the
system *froze*.  That's clearly not the case.  Especially when dealing
with non-NetBSD people, it's worthwhile to be *much* more clear about
things like this.

> Every once in a while when under heavy load, the MII layer
> becomes confused about which entry is next in the circular
> buffer descriptor list.

First of all, `the MII layer' has absolutely nothing to do with where
packets are stored.  It's a serial interface between the MAC and the
PHY, nothing more.  What you meant to say was `the MAC'.

Secondly, what evidence to you have to suggest that it's the hardware
-- and not our driver -- that's confused??  If you're going to
implicate someone else (namely Apple), you need something a lot better
than `seems to be doing something weird' to back it up!

Personally, I still don't see any reason to believe it isn't a bug in
the driver.  There is plenty of suspicious crap going on here -- most
notably a distinct lack of cache and write buffer synchronization all
over the place.

> Apr 20 19:04:37 olive-30 /netbsd: reset GMAC read hand from 31 to 1 for gm0
> Apr 20 19:20:49 olive-30 /netbsd: reset GMAC read hand from 13 to 15 for gm0
> Apr 20 20:01:33 olive-30 inetd[1665]: connection from 10.1.5.197, service
> telnet (tcp)

That's a wonderful syslog to show that your patch actually did have an
effect.  Unfortunately, it says little more.  For example, did you test
to see if all packets were received?  If not, how do you know the
receive function is actually working correctly?  Perhaps your patch is
merely a bandaid that causes the problem to be less annoying.

>     if (last >= 0) { /* test for any work done */
>         mb();
>         GM_OUT(GM_RX_KICK, last & 0xFFFFFFFC);
>     }
> 
> The "mb()" macro maps to the "sync" instruction that flushes both the
> data cache and the instruction pipeline.  The "KICK" is the kicker.
> I don't understand it.  "last" is the last successfully processed buffer
> number.  Thus, the "GM_OUT()" macro basically seems to be telling the
> hardware to go back 1 to 4 buffer slots for storing the next
> received transmission.  That seems like it simply cannot be correct,
> but that is certainly the way the code reads!  Can anyone clarify?
> I have no experience using this.

If this were as you suggest, then our own use of `RXKICK' (writing
NRXBUF) wouldn't even make sense.  I think it's more likely that this
sets a tail pointer on the chip, so it knows where in the ring it can
receive to, and when packets are queued, the Linux driver is updating
the tail pointer.  But I admit that's an EWAG [*].

> 5.  What does OS X (FreeBSD deriviative) do?
> 
> Someone at Apple?  Anyone?

There's no way they would be able to help without reviewing
substantially more of the code than you pasted.  My suspicion is that
they scratched their heads and wondered WTF you were talking about.

It's also a bit impolitic to refer to OS X as a `FreeBSD derivative'
here, given that they use a substantial amount of code from NetBSD.


[*] That's `experienced wild ass guess', which does not appear to be in
the wtf(6) list.  B-)