Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sk/skc/makphy breakage on current

On Tue, Jun 17, 2008 at 05:07:10AM +0200, Quentin Garnier wrote:
> > skc0 at pci5 dev 6 function 0: ioapic0 pin 21
> > skc0: interrupt moderation is 0 us
> > skc0: DGE-530T Gigabit Ethernet Adapter rev. (0x9)
> > sk0 at skc0 port A: Ethernet address 00:15:e9:bd:0c:1e
> > makphy0 at sk0 phy 0: Marvell 88E1011 Gigabit PHY, rev. 5
> > skc0: interrupt moderation is 1000 us
> > panic: sk_jfree: buffer not in use!
> That sounds like the issue I "fixed" a while ago in nfe(4).  It appears
> that the callback used to free mbuf external storage is not called at
> splnet() ever since... some unknown changes (Andrew, do you know exactly
> what made that happen?), so you need to introduce a mutex protecting the
> allocator structures.
> It's a rather easy fix, I'll have a try at it later, unless someone
> beats me to it.  All other mbuf external storage allocators should be
> audited, they're likely to suffer from the same issue as well.

This problem also seems to occur on USB network cards:
 Jun 20 20:41:43 byers /netbsd: ral0: could not retrieve Tx statistics - 
cancelling automatic rate control
 Jun 20 20:41:49 byers /netbsd: ral0: could not transmit buffer: IOERROR
 Jun 20 20:41:54 byers /netbsd: ral0: device timeout
 Jun 20 20:42:07 byers /netbsd: nfs server frohike:/home/ftp/sharing: not 
 Jun 20 20:42:18 byers last message repeated 3 times
 Jun 20 20:42:32 byers /netbsd: ral0: could not read MAC register: IOERROR
 Jun 20 20:42:32 byers /netbsd: ral0: could not write MAC register: IOERROR
 Jun 20 20:42:32 byers /netbsd: ral0: could not set test mode: IOERROR
 Jun 20 20:42:32 byers /netbsd: ral0: could not write MAC register: IOERROR
 Jun 20 20:42:32 byers last message repeated 22 times

The only way to get the ral working again is to pull it out and plug it in 
Often this results in:
 Jun 20 20:42:32 byers /netbsd: uhub1: device problem, disabling port 1
The LED on the card then dims.  Trying this a couple of times finally results
in ral0 getting detected again and after some fudging with dhclient and
wpa_supplicant I can get the network up again, until it happens again.
Sometimes it happens soon after reconnect, sometimes it takes a long while.

I did some scanning of the code, and it looks like these errors are all a
consequence of calls into the usbd subsystem returning error codes to the
callback registered by ural, so nothing ral-specific.
If I understand correctly, each buffer allocated by usbd (iface->endpoints)
need to be protected by a mutex, but I may be mistaken about this.
Someone with knowledge of the USB subsystem should probably look at this.

"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth

Attachment: pgp87DRu4GZ_Q.pgp
Description: PGP signature

Home | Main Index | Thread Index | Old Index