Re: Network lossage with 4.99.59

On Thu, 17 Apr 2008, Peter Bex wrote:

On Thu, Apr 17, 2008 at 04:47:54AM -0700, Paul Goyette wrote:
I've updated one of my local machines to 4.99.59 (after Geoff Wing came
up with a fix for the aiboost problem), and now I'm seeing some strange
network lossage.

An ssh session from another machine (running 4.99.55) just randomly
drops with an error similar to

    Received disconnect from 2: Bad packet length 3493832540.

And overnight backups (using amanda) also failed with a "Broken pipe"

Anyone else seeing such strange behavior?

Yes, I was seeing the exact same behaviour.  See my message from a few
hours ago "Random network lossage on nfe(4)".

Are you also using nfe?  Or an nforce based mobo?  Even regular PCI
network cards I added to that machine didn't work anymore, but that
was already so with 4.0.  It may have been the case that the mobo was
fried, since I moved the cards to a new machine with another chipset
and it works just perfect.

Yes!  I'm also using nfe on the 4.99.59 system.

Additional data points:
1. After the ssh session dies, further attempts to ssh from the 4.99.55 system to the 4.99.59 system hang after sshd forks the priv'd and non- priv'd processes (the non-priv process hangs in [net]). The [priv] copy cannot be killed. And restarting sshd does not help.

2. At the same time, I had a 'make release' running on the 4.99.59 system. The /usr/src directory is NFS-exported from the 4.99.59 system to the 4.99.55 system. At some point (trigger unknown), the 'make release' just hung, even though the 4.99.55 system was still able to look at files on the NFS volume!

3. Other network activity seems to work, including being able to ping in both directions, and to establish a FTP session from the 4.99.55 system to ftpd on 4.99.59.

