tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

CRC errors with gem(4)


I'm trying to track down a bug with copper gem cards, where they will
generate invalid frames when sending lots of back-to-back UDP frames.
A simple way to reproduce this is to run:

  /tmp/ttcp -u -s -t -b 32768 -n 10 -l 16384 <somehost>

using a gem card.  It consistently generates the invalid frames, e.g. at
100Mb/s, my cisco switch always see 35 CRC errors for this command.

I noticed that it's possible to program the gem chip to pass up packets
with invalid CRC, so I added this to the driver and looped back gem1 to
gem0 with a cross-over cable.  Now, when I run the command from gem1, and
capture with:

  tcpdump -e -x -vv -i gem0 > /tmp/tcpdump.out 2>&1 &

I see lots of good packets:

  16:03:21.173534 00:03:ba:68:35:4a > 08:00:20:f7:8e:80, ethertype IPv4 
(0x0800), length 1514: IP (tos 0x0, ttl  64, id 34, offset 13320, flags [+], 
length: 1500) anor > sirion: udp
        0x0000:  4500 05dc 0022 2681 4011 d010 5102 6e2a  E...."&.@...Q.n*
        0x0010:  5102 6e2f 2c2d 2e2f 3031 3233 3435 3637  Q.n/,-./01234567
        0x0020:  3839 3a3b 3c3d 3e3f 4041 4243 4445 4647  89:;<=>?@ABCDEFG
        0x0030:  4849 4a4b 4c4d 4e4f 5051 5253 5455 5657  HIJKLMNOPQRSTUVW
        0x0040:  5859 5a5b 5c5d 5e5f 6061 6263 6465 6667  XYZ[\]^_`abcdefg
        0x0050:  6869                                     hi 

and occasional packets like:

  16:03:21.206802 20:f7:8e:80:00:03 > 37:38:39:3a:08:00, ethertype Unknown 
(0xba68), length 150:
        0x0000:  354a 0800 4500 0084 0022 07f3 4011 f3f6  5J..E...."..@...
        0x0010:  5102 6e2a 5102 6e2f 3b3c 3d3e 3f40 4142  Q.n*Q.n/;<=>?@AB
        0x0020:  4344 4546 4748 494a 4b4c 4d4e 4f50 5152  CDEFGHIJKLMNOPQR
        0x0030:  5354 5556 5758 595a 5b5c 5d5e 5f60 6162  STUVWXYZ[\]^_`ab
        0x0040:  6364 6566 6768 696a 6b6c 6d6e 6f70 7172  cdefghijklmnopqr
        0x0050:  7374                                     st


  16:03:21.472989 08:00:20:f7:8e:80 > 46:47:48:49:4a:4b, 802.3, length 66: LLC, 
dsap Unknown (0xba), ssap Unknown (0x68), cmd 0x35, sap 68 > sap ba rnr 
(r=37,C) len=48   
        0x0000:  ba68 354a 0800 4500 0020 0000 0000 4011  .h5J..E.......@.
        0x0010:  fc6f 5102 6e2a 5102 6e2f fffa 1389 000c  .oQ.n*Q.n/......
        0x0020:  2bb0 2021 2223 0000 0000 0000 0000 0000  +..!"#..........
        0x0030:  0000 0000                                ....

Some expected packets don't appear in the capture (they could be dropped
by the receiving hardware though).

A hack to get round this is to add a delay(70) before transmitting each
full size UDP packet.  Any smaller delay doesn't help.  I've also tried
increasing the inter-packet gap (which had no effect) and making the card
generate an interrupt for each UDP packet sent (which helped a little -
CRC errors dropped to 7).

I don't see the problem with TCP.  I haven't tested IPv6.  Hardware
checksums are off.  This happens with 4.0 and -current on both sparc64
and macppc.

It looks like the hardware generates the correct TX complete interrupts
even for the invalid and the missing packets.

If anyone has any ideas as to why this might be happening (bugs in the gem
DMA code or hardware errors), that would be great.



PS.  Thanks to dyoung@ for pointers (and gem fixes) and to riz@ for testing.

     The complete tcpdump is at:

  My other computer also runs NetBSD    /        Sailing at Newbiggin        /

Home | Main Index | Thread Index | Old Index