Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Strange behavior on new machine (ioapic, nfs)



On Tue, Jun 22, 2010 at 03:39:48PM +0100, Patrick Welche wrote:
> On Mon, Jun 21, 2010 at 07:12:11PM +0100, Patrick Welche wrote:
> > On Mon, Jun 21, 2010 at 05:54:55AM -0700, Paul Goyette wrote:
> > > Problem #2 is an issue with nfs (client).  Because of #1, I have
> > > added a USB-based network interface for temporary connectivity.  I
> > > can ping across this interface to my nfs server without any problems
> > > (including large, 2k+ byte pings), and I can mount the remote file
> > > systems.  I can even do a ``df'' to see the mounted file systems.
> > > But as soon as I try to access a file on the remote fs, all access
> > > to that fs, including a new ``df'', hang in tstile.  Even while the
> > > fs is hung, I can still ping between the nfs client and server.
> > 
> > All I can say is "me too". I am trying to netboot a poweredge R200, which
> > has two bge0 Broadcom BCM5721. Tried both NetBSD-current/amd64 and i386.
> > The NFS server is NetBSD-current/amd64 with nfe0.
> > 
> > Netbooting OK up to point where root partition needs nfs mounting. I can
> > use a root partition on a CD. Can mount the nfs root partition on /mnt.
> > chroot /mnt then hangs, or tar -xzvpf - from tar files on nfs mounted
> > partitions hangs. tcpdump shows 17 byte udp packets sent every now and
> > then from server to client. Nothing else is going on. (no response from
> > client back to server).
> 
> All run NetBSD-current/amd64, and the nfs server has hw checksumming
> enabled, and
> net.inet.tcp.recvbuf_auto=1
> net.inet.tcp.sendbuf_auto=1
> net.inet.tcp.recvbuf_max=16777216
> net.inet.tcp.sendbuf_max=16777216
> 
> server nfe0 <---> netgear gigabit switch <---> netboot client bge0
>               1G                           1G
> 
> works - all gigabit.
> 
> 
> server nfe0 <---> netgear gigabit <---> 3com 4400 <---> netboot client bge0
>               1G                   100M            100M
> 
> fails as per paragraph above. Same work/fail pattern observed with a client
> with wm0 instead of bge0.
> 
> server nfe0 <---> netgear gigabit <---> 3com 4400 <---> netboot client bge0
>               1G                   100M             1G
> 
> also fails. (Next is to try another switch instead of 4400 to see if
> it is number of hops - 4400 is one of the main lab switches, and everyone
> else is happy - tried different ports)

Just had the opportunity to try some other switches - server and client
running with code from 19th and 21st July 2010 respectively:
(bge0 as above, wm0 02901I mobile (AMT) LAN Controller, rev 3)

Works:
 server nfe0 <---> netgear gigabit <---> 3com 2824 <---> netboot client wm0
               1G                    1G              1G

Fails:
 server nfe0 <---> netgear gigabit <---> 3com 4400 <---> netboot client wm0
               1G                   100M           100M

 server nfe0 <---> netgear gigabit <---> 3com 3300 <---> netboot client wm0
               1G                   100M           100M

 server nfe0 <--->   3com 2824     <---> 3com 3300 <---> netboot client wm0
               1G                   100M           100M

I didn't retest the previous failures (4400 has a 1G module):
 server nfe0 <---> netgear gigabit <---> 3com 4400 <---> netboot client wm0
 server nfe0 <---> netgear gigabit <---> 3com 4400 <---> netboot client bge0
               1G                   100M             1G

Again the failure is after 12 megabytes of kernel are successfully
downloaded, and as it tries to mount its root partition. The dhcp request
with the root partition details is received, then

  lookup fh for dev OK
  lookup fh for console fails (fair enough, init makes a tmpfs /dev)
  lookup fh for sbin OK
  lookup fh for init OK, access attr, getattr OK
  read 30736 bytes of init (init is that long) gets

IP (tos 0x0, ttl 64, id 15, offset 0, flags [none], proto UDP (17), length 132) 
client.2479824598 > server.nfs: 104 read fh 18,36/6081646 30736 bytes @ 0
IP (tos 0x0, ttl 64, id 4860, offset 0, flags [+], proto UDP (17), length 1500, 
bad cksum 0 (->d27d)!) server.nfs > client.2479824598: reply ok 1472 read REG 
555 ids 0/0 sz 30736 nlink 1 rdev 3313/5896 fsid 1224 fileid 5ccc6e a/m/ctime 
1279731571.984613442 1279639970.000000 1279642650.115731470 30736 bytes EOF
IP (tos 0x0, ttl 64, id 4860, offset 1480, flags [+], proto UDP (17), length 
1500, bad cksum 0 (->d1c4)!) server> client: udp

The last line repeats, and the client doesn't boot.

I think the bad cksum is because I have hardware checksumming enabled on the
server, so that's fine.

Cheers,

Patrick


Home | Main Index | Thread Index | Old Index