Subject: kern/12324:
To: None <gnats-bugs@gnats.netbsd.org>
From: None <frueauf@netbsd.org>
List: netbsd-bugs
Date: 03/03/2001 21:06:56
>Number:         12324
>Category:       kern
>Synopsis:       
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Mar 03 12:07:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     Thorsten Frueauf
>Release:        <NetBSD-current source date> 1.5.1_ALPHA from 03.03.2001
>Organization:
private
	
>Environment:
	
nfs server:
		NetBSD/i386 kernel and userland: 1.5.1_ALPHA from 13.01.2001
		Toshiba Tecra 500CDT (named cyberlap)

nfs client:
		NetBSD/hp300 kernel: 1.5.1_ALPHA from 03.03.2001
		             userland: 1.5.1.ALPHA from 03.12.2000
		HP 9000/400T (named cybersil)

>Description:
	
My laptop (named cyberlap) exports several filesystems via nfs:

[frueauf@cyberlap]/root# showmount -e
Exports list on localhost:
/usr                               cyberdyne cybersil cyberdec-1 cyberdec-2 cyberdec-3 cyberdec-4 sunlap sparc1 
/space2                            cyberlap cyberdyne cybersil cyberdec-1 cyberdec-2 cyberdec-3 cyberdec-4 sunlap sparc1 
/space1                            cyberlap cyberdyne cybersil cyberdec-1 cyberdec-2 cyberdec-3 cyberdec-4 sunlap sparc1 

Particulary /space1/src holds src for NetBSD 1.5.1_ALPHA checked out via anoncvs
from 03.03.2001.

I have several other NetBSD machines which mount that filesystem to build
their OS.

I build a new kernel on my HP 9000/400T, bootet from it and tried to build
userland. But the nfs mount hangs very soon while doing "make includes".

At the same time my Sun Sparcstation 5 does the same mount from
cyberlap, building userland, but no hangs at all.

I can still do a succesfull "showmount -e cyberlab" on the hp300 system.
I can even mount the same filesystem to a different mountpoint. But
reading a file from the new mountpoint hangs imediatly while a previous
"ls -la" worked.

In /var/log/messages I see "nfs server not responding" on the hp300.
When I try to ^C the build, after ~ 20 minutes it gets a
"nfs server alive again" and the ^C gets through. I can then umount and
remount the FS. But it hangs as soon I try to read a file.

On the nfs server I saw the following in /var/log/messages:
Mar  3 19:44:44 cyberlap /netbsd: nfsd send error 55
Mar  3 19:46:33 cyberlap /netbsd: nfsd send error 55

Doing a tcpdump on the server I see the following sequence coming once in
a row of several minutes:

20:12:56.363125 cybersil.312929716 > cyberlap.nfs: 132 read [|nfs] (ttl 64, id 8
822)
20:12:56.365557 cyberlap.nfs > cybersil.312929716: reply ok 1472 read REG 644 id
s 4105/9 [|nfs] (frag 60028:1480@0+) (ttl 64)
20:12:56.366474 cyberlap > cybersil: (frag 60028:1480@1480+) (ttl 64)
20:12:56.367358 cyberlap > cybersil: (frag 60028:1480@2960+) (ttl 64)
20:12:56.368314 cyberlap > cybersil: (frag 60028:1480@4440+) (ttl 64)
20:12:56.369553 cyberlap > cybersil: (frag 60028:1480@5920+) (ttl 64)
20:12:56.370771 cyberlap > cybersil: (frag 60028:1480@7400+) (ttl 64)
20:12:56.371998 cyberlap > cybersil: (frag 60028:1480@8880+) (ttl 64)
20:12:56.373229 cyberlap > cybersil: (frag 60028:1480@10360+) (ttl 64)
20:12:56.374459 cyberlap > cybersil: (frag 60028:1480@11840+) (ttl 64)
20:12:56.375691 cyberlap > cybersil: (frag 60028:1480@13320+) (ttl 64)
20:12:56.376920 cyberlap > cybersil: (frag 60028:1480@14800+) (ttl 64)
20:12:56.378151 cyberlap > cybersil: (frag 60028:1480@16280+) (ttl 64)
20:12:56.379384 cyberlap > cybersil: (frag 60028:1480@17760+) (ttl 64)
20:12:56.380620 cyberlap > cybersil: (frag 60028:1480@19240+) (ttl 64)
20:12:56.381847 cyberlap > cybersil: (frag 60028:1480@20720+) (ttl 64)
20:12:56.383075 cyberlap > cybersil: (frag 60028:1480@22200+) (ttl 64)
20:12:56.384306 cyberlap > cybersil: (frag 60028:1480@23680+) (ttl 64)
20:12:56.385537 cyberlap > cybersil: (frag 60028:1480@25160+) (ttl 64)
20:12:56.386769 cyberlap > cybersil: (frag 60028:1480@26640+) (ttl 64)
20:12:56.388000 cyberlap > cybersil: (frag 60028:1480@28120+) (ttl 64)
20:12:56.389230 cyberlap > cybersil: (frag 60028:1480@29600+) (ttl 64)
20:12:56.390467 cyberlap > cybersil: (frag 60028:1480@31080+) (ttl 64)
20:12:56.391693 cyberlap > cybersil: (frag 60028:344@32560) (ttl 64)


If I boot the hp300 with a kernel build from 1.5.1_ALPHA sources checked out
at 03.12.2000 I don't see that problem, e.g. "make includes" goes through.


>How-To-Repeat:
	
Export /src from an i386 system, mount it on a hp300 system. Doing this
with kernels build with 1.5.1_ALPHA as of 03.03.2001 will show the above
problem.

Mounting it from sparc does not show the problem.
>Fix:
	
I have not the slightest idea :-(
>Release-Note:
>Audit-Trail:
>Unformatted:
 nfs problems: hung nfs mounts with 1.5.1_ALPHA, i386 server, hp300 client