Subject: Re: daily crashes with 1.6.1
To: None <current-users@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 07/05/2003 16:02:02
On Fri, Jul 04, 2003 at 11:20:18AM -0400, Tim Middleton wrote:
> [...]
> Several times the master.passwd file has been corrupt after rebooting, and had 
> to be restored... interestingly it has been corrupt in the exact same way 
> each time... overwritten by a chunk of our named.conf. Both the password file 
> and named.conf are re-written freqently by a cron script often on this 
> system... so it would seem to indicate that whatever is the problem it may be 
> triggered to file writing. (We were paranoid that something had gone wrong 
> with our scripts, causing them to overwrite the master.passwd file somehow... 
> but, overwriting the master.passwd file would not cause a box to lock to the 
> point of not responding at all to pings, would it? And also we disabled those 
> cron scripts, and the box still eventually locked up... though at least the 
> password files were not corrupt in these cases).
> 
> Also we're not sure how this could be related to our current prime suspect, 
> NFS, as the password files are not on a NFS related partition.

Well, this could be the same as a problem I've seen.
I run mrtg from cron to gather statistics about the local system (interesting
system parameters and counters, UPS parameters, etc).
On some occasion, I get a mail from cron because the mrtg command failed.
Perl complains about syntax errors in some files (usually in perl modules).
I first suspected a bug in perl. But now that I'm thinking on it ...
I've only seen this on boxes which are NFS servers.
Note that it may not be related to NFS, only more frequent on NFS servers
because of the high number of vnodes in use, buffer pages recycle, etc.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 24 ans d'experience feront toujours la difference
--