Subject: Re: Recommendations wanted for 100baseTX cards
To: None <darkstar@pgh.net>
From: Bill Paul <wpaul@ctr.columbia.edu>
List: port-i386
Date: 01/31/2000 10:57:48
Of all the gin joints in all the towns in all the world, Matthew Orgass
had to walk into mine and say:
 
> On Sun, 30 Jan 2000, Bill Paul wrote:
> 
> > Hm. What isn't working exactly? I didn't have too much trouble getting it
> > to work on FreeBSD (except for the transmit corruption problem due to the
> > broken scatter/gather DMA). One of the things I was told during my exchanges
> > with the Davicom people is that the DM9102A requires the TX and RX
> > descriptors to be aligned on 16 byte boundaries (4 DWords alignment, as
> > they put it). It's possible the DM9102 has the same restriction.
> >
> > Hard to say what could be going wrong without more details though.
> 
>   The symptoms are:
> 1) It never sends a transmit complete interrupt, only early transmit
> interrupts (if requested, if not it does nothing).

Hm. Something else I never noticed. In my driver, I handle both the
transmit complete and no transmit buffer available condition as the same
thing (all pending TX descriptors have been sent, ditch all the mbufs).
One of these must be firing otherwise I'd be running out of mbufs
really quick.

It looks like Davicom's own driver for Linux doesn't even bother to check
any of the 'something transmit-related happened' bits in the status
register. Their interrupt handler just looks to see if there are any
transmissions pending at all, and then goes through them and tests the
owner bits in the descriptors to see if it's safe to reclaim them.

> 2) It *does* clear the owner bit on the setup packet. 

Uh... but not on any of the transmitted frames?

> 3) It fails to idle with transmit status RUNNING-CLOSE-CLEAR OWNER and
> receive status RUNNING-WAIT.
> 4) The SROM reports bogus GPR media

Yeah, well, I never pay attention to the SROM anyway. But that's another
story.

I gather than that the main problem is that it doesn't transmit. (Since
you said nothing about receive, I'll assume that works.) I don't have the
tlp driver in front of me so it's hard to say what could be wrong, but
I can tell you what I'm doing that seems to be working:

Initialization:
- Do a global reset
- Set CSR0 to 0, which Davicom claims is necessary since any other
  setting may lead to instabilities.
- Set the TX threshold to the minimum (best performance) value.
- Set 'no RX CRC' bit.
- Initialize descriptor queues. I use chained descriptor mode only,
  since the same driver needs to support the ASIX chip which only
  works in chained descriptor mode. The descriptors are allocated
  as a single chunk starting on a page boundary (using contigmalloc())
  and split up into two arrays, one for TX and one for RX. Each
  descriptor array has a companion 'chain' array which contains
  the virtual mbuf pointers and a few other sundry things. The 'chain'
  arrays don't need to be contigmalloc()ed. Since the descriptor
  memory is page aligned, the descriptors themselves are also aligned
  on 16-byte boundaries, like Davicom suggests.
  (This is not how I did it originally: I defined my own descriptor
  structure with the chip's descriptor structure on top and the mbuf
  pointers appended to that, rounded out to some sensible size. This
  is *supposed* to be ok for chained descriptor mode since the the
  descriptors can be anywhere as long as they don't get split over a
  page boundary. However I really wanted to see what difference it
  would make if I used fixed length ring mode since that allows you
  to use to fragment pointers per descriptor. But I had a lot of trouble
  with some chips trying to define a non-zero skip length, so I decided
  to throw out the custom structure definition and just make two sets
  of arrays, which would end up using the same amount of memory anyway.
  Then I discovered that fixed length ring mode didn't really seem to
  help performance, so I went back to chained mode but left the descriptor
  allocation alone.)
- Load the physical addresses of the heads of the RX and TX descriptor
  rings into 'RX ring base address' and 'TX ring base address' registers.
- Enable interrupts.
- Set the 'TX on' bit in CSR6 to enable the transmitter.
- Load the RX filter. This consumes the first descriptor in the TX
  ring and leaves the chip looking at the second descriptor, which is
  where the first packet will go. I program the receiver in 'hash
  perfect' mode (one perfect filter entry for the station address plus
  the 512-bit multicast hash table). I also set the bit in the hash
  table that relates to the broadcast address to enable reception of
  broadcast frames. If the IFF_PROMISC flag is set, I set the promiscuous
  mode bit in CSR6, otherwise I clear it. If the IFF_ALLMULTI bit is
  set, I set the 'receive all multicasts' bit in CSR6, otherwise I
  clear it.
- Set the 'RX on' bit in CSR6 to enable the receiver.
- Write a value to the 'RX DMA start' to start the RX process going.

Performance is not stellar due to the transmit buffer coalescing, but
gets off the ground pretty reliably.

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Department of Electrical Engineering
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
"Mulder, toads just fell from the sky!" "I guess their parachutes didn't open."
=============================================================================