Subject: kern/9502: Interesting LFS problem
To: None <gnats-bugs@gnats.netbsd.org>
From: Jason R Thorpe <thorpej@nas.nasa.gov>
List: netbsd-bugs
Date: 02/28/2000 14:12:41
>Number:         9502
>Category:       kern
>Synopsis:       Interesting LFS problem
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 28 14:12:00 2000
>Last-Modified:
>Originator:     
>Organization:
Numerical Aerospace Simulation Facility - NASA Ames
>Release:        NetBSD 1.4T, Feb 28 2000
>Environment:
	
System: NetBSD bishop 1.4T NetBSD 1.4T (BISHOP) #1010: Thu Feb 24 16:24:46 PST 2000 thorpej@bishop:/amd/dracul/u2/netbsd/src/sys/arch/alpha/compile/BISHOP alpha


>Description:
	I needed to scrub out my object tree and decided to try using
	LFS for it again.

	I created an LFS file system and made /usr/obj point to it.  This
	file system is NFS exported, and 5 or 6 other systems mount that
	file system for their /usr/obj as well.

	While the server was churning along on a "make build", one of
	the clients (an AlphaStation 500) was also doing a "make build".

	The client failed to finish building libc:

cc -O2 -DALL_STATE  -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Werror   -D_LIBC -DNLS -DYP -DHESIOD -DLIBC_SCCS -DSYSLIBC_SCCS  -D_REENTRANT -I/amd/dracul/u2/netbsd/src/lib/libc/include -DINET6 -D__DBINTERFACE_PRIVATE -DRESOLVSORT -I. -DPOSIX_MISTAKE -DFLOATING_POINT -c /amd/dracul/u2/netbsd/src/lib/libc/net/res_query.c
ld: cannot open output file res_query.o: Input/output error
*** Error code 1

	Upon further investigation:

bishop:thorpej 103$ sudo touch obj.alpha/res_query.o                           
Password:
touch: obj.alpha/res_query.o: Input/output error
bishop:thorpej 104$ sudo touch obj.alpha/res_query.oaa
bishop:thorpej 105$ sudo rm obj.alpha/res_query.oaa
bishop:thorpej 106$ ls obj.alpha/res_query.*                                   
ls: obj.alpha/res_query.o: No such file or directory
16 obj.alpha/res_query.ln               10 obj.alpha/res_query.o.o

	The same problems happens on the server.  I'm guessing a directory
	entry is trashed.

	Note that this may not be specific to LFS via NFS, but it may
	be that it was easier to tickle this problem using this access
	method.

	fsck_lfs says:

dracul:thorpej 79$ fsck_lfs -n /dev/rsd3a
** /dev/rsd3a (NO WRITE)
** Last Mounted on /u3
** Phase 0 - Check Segment Summaries
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
UNALLOCATED  I=6643 
INO is NULL

REMOVE? no

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
17011 files, 139303 used, 0 free 
dracul:thorpej 80$ 

>How-To-Repeat:
	Not sure... it "just happened".  I'll try and reproduce it again
	after I re newfs_lfs it.  (I can't remove that blasted file.)

	However, I'll keep this file system around in case Konrad
	as anything he wants me to try :-)

>Fix:
	Unknown.
>Audit-Trail:
>Unformatted: