Subject: Re: Followup to #5073
To: Dominic J Hulewicz <firstname.lastname@example.org>
From: Greg Wohletz <greg@duke.CS.UNLV.EDU>
Date: 07/29/1998 02:17:46
>I reported a problem (#5073) back in February about a NetBSD 1.3 i386
>host that regularly panics with "vrele: ref cnt". After swapping out
>all the hardware and completely reinstalling with 1.3.2, the problem
>still occurred so I knew that it must be something I was running.
>I eventually traced it to an accidental duplication of rsync processes.
>I have two scripts on another machine sending data via rsync+ssh to the
>NetBSD host, one sends at two minute intervals, the other every five
>minutes. At some point the two separate copy commands had been merged
>into a single script, but the cron entry for the second script had not
>been removed. This meant that every ten minutes, two rsync processes
>would collide and try to mirror the same directory to the same machine.
>I guess the (bad) luck of the machine panicing is down to some sort of
>race condition / timing issue that occurs. I would imagine the problem
>could be easily replicated by setting up two or more timed rsync
>sessions set to go off at the same time.
I'm not doing anything of this sort, my system is just an NFS server for 50
or so unix workstation. It gets these panics quite frequently. Doing a
dump of the filesystem seems to greatly increase the odds that a panic will
occur. I also submitted a PR (#5026)
In looking at the -current source it looks like the vget/vrele code is
undergoing significant change, maybe the new implementation will manage to
avoid whatever this mysterious problem is.
If anyone is curious, what I have discovered about this bug is included in
the PR, I also have various core dumps / kernels available if folks want to
look at them at http://www.egr.unlv.edu/~greg/netbsd/
Currently I have a really ugly work around in place for this panic which
seems to be working out OK, but it won't really get tested till fall