Re: kern/41974: panic in cpu_in_cksum / likely NFS issue

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,spz%NetBSD.org@localhost
Subject: Re: kern/41974: panic in cpu_in_cksum / likely NFS issue
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Tue, 24 Feb 2015 04:15:00 +0000 (UTC)

The following reply was made to PR kern/41974; it has been noted by GNATS.

From: "Greg A. Woods" <woods%planix.ca@localhost>
To: NetBSD GNATS <gnats-bugs%NetBSD.org@localhost>,
    NetBSD GNATS Administrator <gnats-admin%NetBSD.org@localhost>
Cc: 
Subject: Re: kern/41974: panic in cpu_in_cksum / likely NFS issue
Date: Mon, 23 Feb 2015 18:53:18 -0800

 Some more info, possibly useful....

 I recently, and finally, switched one of my servers from i386 to amd64
 and suddenly I get these same cpu_in_cksum uvm_fault panics almost any
 time I try to write (i.e. copy a large file) to an NFS mount point.  Not
 with every write, but it doesn't seem to take very many tries to
 reproduce.

 I never ever saw this problem before with the i386 kernel.

 Both the before (i386) and after (amd64) systems were built from the
 same source tree, which is on the very tip of the netbsd-5 branch.

 These are running bare-metal on a Dell PE2950 (2x8-core, 32GB RAM).

 It doesn't make any difference whether hardware assisted check-summing
 capabilities are enabled in the ethernet interface or not.  Initial
 panics were observed with caps_enabled=3D0, but panics have continued with
 the following config:

 $ /sbin/ifconfig bnx1
 bnx1: flags=3D8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>=
  mtu 1500
         capabilities=3D3f00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,U=
 DP4CSUM_Rx,UDP4CSUM_Tx>
         caps_enabled=3D3f00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,U=
 DP4CSUM_Rx,UDP4CSUM_Tx>
         address: 00:1d:09:35:3c:09
         media: Ethernet autoselect (1000baseT full-duplex)
         status: active
         inet 10.0.1.129 netmask 0xffffff00 broadcast 10.0.1.255

 There's no trouble reading from remote NFS servers -- only writing to
 them as an NFS client, and perhaps only with larger files/writes.  I've
 done several full builds, and a bunch of pkgsrc builds, with sources on
 the same NFS server which fails when written to, and I've never had any
 problem with the read-only access to src and pkgsrc.  Manual tests with
 'dd' reading large files with large reads work A-OK as well
 (i.e. reading with the amd64 kernel as a client, or reading from the
 other machine with the adm64 kernel as a server).

 I.e.:  note that the amd64 kernel happily serves NFS without
 encountering this error.

 Assuming the new PE2950 that arrived today is in working order then soon
 I should be able to test if this happens in a Xen domU, and with
 NetBSD-current.

 One other possibly interesting point:  The server in this case has been
 an older PE2650 running NetBSD 4.0_STABLE, and it has a weird "tick" in
 its RAID controller and/or driver (see PR# kern/35769), which means it
 sometimes doesn't always respond to NFS requests in the most timely
 manner.  I.e. perhaps this bug is more easily tickled when the NFS
 server is slow, and/or the network connection is poor, or similar.
 Perhaps I will try using an NFS mount of my iMac; and soon I should also
 be able to cross-mount the PE2950s for testing as well (especially if
 the bug is reproducible in a Xen kernel).

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods%planix.com@localhost>       +1 250 762-7675        http://www.planix.com/

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: PR/49328 CVS commit: [netbsd-7] src/sys/dev/pci/ixgbe
Previous by Thread: Re: kern/41974: panic in cpu_in_cksum / likely NFS issue
Next by Thread: Re: kern/41974: panic in cpu_in_cksum / likely NFS issue
Indexes:

Home | Main Index | Thread Index | Old Index