Subject: Re: weird network delays connecting to cable provider's servers...
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Greg Oster <firstname.lastname@example.org>
Date: 04/21/1999 18:57:11
[Oh yay! Another mailing list I need to subscribe to... Sigh... and this one
even looks like I should have subscribed to it ages ago.. :-/ ]
Jonathan Stone writes:
> In message <199904212249.IAA18609@wombat.cs.rmit.edu.au>
> Luke Mewburn writes:
> >so, after i discovered that my friend's 1.3.3/sparc box was ok,
> >i decided to test a few things myself. here's a summary
> > NetBSD-1.3.3/sparc with le0 good
> > NetBSD-1.3.3/i386 with we0 bad
> > NetBSD-current/i386 with we0 bad
> > NetBSD-current/i386 with ex0 good
> >we0 is either a SMC 8216 or 8013; I've tried both.
> >ex0 is a 3com 3c905.
The cards I'm seeing grief with are RTL8019's. (ne2000 critters)
> >So, it looks like the problem is the we driver. However, it's only bad
> >when connected to the cable ISP's hosts. To recap:
> > netbsdbox <-> cable modem <-> cable isp <-> internet <-> other_hosts
> > |
> > cable_isp_hosts
> >* ftp between netbsdbox/we0 <-> other_hosts is ok; 200k/s, as I'd expect,
> > given that I think the link from the ISP to the internet is a 2Mb link
> >* ftp between netbsdbox/we0 <-> cable_isp_hosts is bad; 1.5k/s
> >* ftp between netbsdbox/ex0 <-> other_hosts is unknown (i didn't test :/ ),
> > but i'd expect it to be 200k/s
> >* ftp between netbsdbox/ex0 <-> cable_isp_hosts is good; 600k/s
> To the level of detail you give, this is _exactly_ the symptoms Greg
> Oster encountered and mentioned elsewhere. Greg is using an ne2000.
> Between us, Greg and I ascertained that the data geetting down to the
> driver (and into bpf on the sending machine) was good, but we didn't
> have a second machine to monitor the acutal cable between the ne2000
> and the cable modem.
Yes... I've just finished reading the archives of this, and it's *EXACTLY*
what I'm seeing... Even the tcpdump behaviour matches. From the traces
I've got (from both my home end, and near "other_host"), what's happening
is that the IP packet length field in the IP packet header is wrong by 4 bytes. (the TCP ACKs have a size of 0x28 when they leave my box, but have an
(apparent) size of 0x2c on the receiveing end. The IP end seems to pick the
next 4 bytes from whatever buffer it's in when the receving end gets the
packet, but the TCP layer rejects the packet as the TCP checksums (usually)
fail!) Even more interesting is that this is exactly a 1-bit error... :-(
The problem *only* shows up on the first ACK for a given data packet. The
ACK's for the retransmitted data get through just fine!!
> Luke, If you can do that in your setup (coax, or nonswitched 10baseT),
> please let me know and I'll send you copies of my libpcap tools. Or
> just a tcpdump capture (with -s1500) on both the we0 machine and an
> adjacent machine, that'd verify the point of failure.
Are the hosts that you are conneting to linux-based? I seem to only have
trouble connecting to linux boxes for whatever reason... (non-linux hosts on
the same network segment do not seem to trigger this :-( )
> Then we can take the dp8390 code apart with a fine-toothed comb.
> >So, it looks like a bug in the `we' driver here. Whilst I can put a 3x905
> >card in my box, I think that shipping 1.4 with a dodgy we driver isn't
> >something we should do, unless we put something in the release notes like
> >``this card is supported unless being used to connect to Telstra's Big
> >Pond Cable'' ;-|
> ``What he said''. I'm especially concerned by the mounting evidence
> of a bug in a NetBSD ethernet driver for very common hardware, but
> where we don't quite know what tickles it.
> I'm wondering if this is something specific to cable modems and
> dp8390-based boards. Late collisions or bad collision-detect, maybe?
I've got a CyberSURFR here, if that matters. I'm more than willing to test
any drivers if it gets this resolved :-)
I've got (full packet) tcpdumps from both sides of an FTP that I can make
available if anyone wants them.
Thanks Jonathan for pointing out this discussion on tech-net...