Subject: Re: daily crashes with 1.6.1
To: None <current-users@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: current-users
Date: 07/04/2003 12:46:50
[ On Friday, July 4, 2003 at 11:20:18 (-0400), Tim Middleton wrote: ]
> Subject: daily crashes with 1.6.1
>
> Until a few weeks ago a box of ours running 1.5.3 was very stable. Since 
> upgrading to 1.6.1 it crashes several times a day.We have 1.6.1 on some boxes 
> which are stable, and have not been able to determine the cause of
> the instability on this one particular box. The main difference between
> this box and the others is that it is an NFS server.

It is also different hardware.  :-)

> Several times the master.passwd file has been corrupt after rebooting, and had 
> to be restored... interestingly it has been corrupt in the exact same way 
> each time... overwritten by a chunk of our named.conf.

Do your scripts also first write to /etc/ptmp (after carefully creating
it with O_CREAT|O_EXCL) then run "pwd_mkdb -p /etc/ptmp" (i.e. in the
same way as ACI's)?

(it might not be such a bad idea to add an flock() call for /etc/ptmp to
those scripts, though normal vipw et al should also fail if /etc/ptmp
exists even if it isn't locked)

> but, overwriting the master.passwd file would not cause a box to lock to the 
> point of not responding at all to pings, would it?

Not unless there's some serious corruption inside the kernel which gets
triggered by one of the system calls involved (e.g. especially flock()).

> And also we disabled those 
> cron scripts, and the box still eventually locked up... though at least the 
> password files were not corrupt in these cases).

Well there you go!  ;-)

Were any of the auto-updated files corrupt in a crash after having
disabled the cron jobs?


> Also we're not sure how this could be related to our current prime suspect, 
> NFS, as the password files are not on a NFS related partition.

I would lean more towards it being a hardware problem....


-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>