Subject: Re: kern/16346: pcn driver panics on startup in IBM PC/325
To: Allen Briggs <briggs@wasabisystems.com>
From: Greg A. Woods <woods@weird.com>
List: tech-net
Date: 09/10/2003 02:49:36
[ On Tuesday, September 9, 2003 at 22:47:53 (-0400), Allen Briggs wrote: ]
> Subject: Re: kern/16346: pcn driver panics on startup in IBM PC/325
>
> This change also helps the DP83815 deal with auto-negotiation.
> When repeatedly unplugging and replugging the cable, the DP83815
> can get stuck in weird state where it won't actually recover link
> without an 'ifconfig sip0 down ; ifconfig sip0 up'.
> 
> Putting delay(500) in mii_phy_reset() seems to prevent this behavior.

Wow!  That's cool!  I'm glad to hear it worked for something!

Unfortunately it didn't seem to do anything for my IBM PC/325.

I'm not 100% sure I tested everything correctly though.  I did suffer a
crash the other day and I tried tweaking and compiling a test kernel
while in single user mode so I'm not 100% sure I got everything right
and I've not had a chance to review and try again under more controlled
circumstances.

The fact it seems to be helping the DP83815 work more reliably is at
least a hint that we may be on the right track for the DP83840 too.

For now I'm still using the old le driver, but it's still kicking out
occasional timeout messages in 1.6 STABLE (though I've not noticed any
related application delays or problems).

A quick ping flood test I did on one earlier test boot with the cable
unplugged suggested that when it's working the pcn driver might be as
much as 50% faster, at least with small packets, and that's making me
more eager to get the PHY working reliably too.

> I don't know if this is a quirk of these PHYs or not.  It might make
> sense that it is, but I'm also wondering if it's something that might
> be seen elsewhere.  It would be easy enough to special-case these
> PHY drivers' reset routine, but if this change could help others, too,
> I'd hate to have it fixed in some places, but not everywhere.

I don't think it can hurt any other PHY....  :-)

Given how these things seem to work, and given there's already a delay
loop in the common reset code, I'm guessing the reset delay may be a
quite common requirement.  It's generally always a bad idea to mess with
inputs to a hardware device before it's got itself into a stable and
workable state after a reset and, IIUC, PHY_READ() has to send a command
to read the requested value back over the serial interface.

> My change was delay(500), and it was enough for sip(4) in this case.

delay(500) is, if I have the numbers right, the minimum for the DP83840A
in the IBM and I always like to err on the conservative side for these
kinds of things.  I don't think that much longer delay will hurt
anything.  After all the existing delay loop will wait up to 100 times
as long.

> I have a pcn(4) (on the same kind of system--IBM/325) with the same
> problem as the PR, but it's also in production and I'm not rebooting
> it unless I have to.

When you get a chance it would probably still be worth trying the change
on your system.  My Ethernet switch is a DEC VNswitch which is a bit
quirky (though I think I've locked it down on that port) and perhaps
it's aggrevating the problem.

I'm also going to try an even longer initial delay on my next test just
in case the specs on the older rev. DP83840 chip that's actually in my
IBM are any different from those given in the DP83840A data sheet I was
reading.

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>