netbsd-bugs: kern/10222: panic: ifree: range

Subject: kern/10222: panic: ifree: range
To: None <gnats-bugs@gnats.netbsd.org>
From: None <Manuel.Bouyer@asim.lip6.fr>
List: netbsd-bugs
Date: 05/29/2000 00:01:12
>Number:         10222
>Category:       kern
>Synopsis:       panic: ifree: range
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 29 00:02:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Manuel Bouyer
>Release:        -current as of May, 26
>Organization:
	LIP6/ASIM
>Environment:
	
System: NetBSD disco 1.4Y NetBSD 1.4Y (DS20-siop) #0: Fri May 26 17:04:19 MEST 2000 bouyer@disco:/home/src/sys/arch/alpha/compile/DS20-siop alpha

disco:/home/bouyer>showmount -e
Exports list on localhost:
/users/disco1                      xxx.xxx.xxx.0
disco:/home/bouyer>df -ki /users/disco1
Filesystem  1K-blocks     Used    Avail Capacity iused   ifree  %iused  Mounted on
/dev/raid2e  88784755  4406538 79938979     5%   51025 6039213     0%   /users/disco1

20 nfsd processes

>Description:
	I tried to crash my NFS server and I succeded. I ran the following
	test programs on 18 clients:
#! /bin/csh
while (1)
zcat /users/disco1/bouyer/gcc-2.95.2.tar.gz | tar xf -
rm -rf gcc-2.95.2
end

and
#! /bin/csh
while (1)
tar cf /dev/null .
end

That is, I have several machines which access the same tree; a tar xf may be
running on one while one other is running rm -rf on it. Sample output from the
commands are:
tar: Cannot add file gcc-2.95.2/gcc/config/sparc/sol2-c1.asm: No such file or directory
tar: Error exit delayed from previous errors
and
tar: Cannot add file gcc-2.95.2/texinfo/util: No such file or directory
tar: Cannot add file gcc-2.95.2/install: No such file or directory
tar: Cannot add file gcc-2.95.2/gcc/config/i386/xm-cygwin.h: No such file or directory

It ran this way for about 4 hours, with an average traffic of about 3MB/s on
the gigabit ethernet; then in paniced with:
panic: ifree: range: dev = 0x1014, ino = 1954047342, fs = /users/disco1
 
Stopped in nfsd at      cpu_Debugger+0x4:       ret     zero,(ra)
db>
db> tr
cpu_Debugger() at cpu_Debugger+0x4
panic() at panic+0xec
ffs_freefile() at ffs_freefile+0x74
ffs_vfree() at ffs_vfree+0x2c
ufs_inactive() at ufs_inactive+0x140
vput() at vput+0xe4
nfsrv_readdirplus() at nfsrv_readdirplus+0x11a0
nfssvc_nfsd() at nfssvc_nfsd+0x628
sys_nfssvc() at sys_nfssvc+0x6f4
syscall() at syscall+0x1d0
XentSys() at XentSys+0x50
--- syscall (155, netbsd.sys_nfssvc) ---
--- user mode ---

I haven't done much investigations, but it appears that vput is called in
nfsrv_readdirplus() when an error occurs, or the file has gone. This looks 
like a race condition.

>How-To-Repeat:
	Concurrent and conflicting access to the same tree from several
	clients
>Fix:
	workaround: don't do that ! :)
>Release-Note:
>Audit-Trail:
>Unformatted: