Subject: Re: daily crashes with 1.6.1
To: NetBSD-current Discussion List <current-users@NetBSD.ORG>
From: Tim Middleton <x@Vex.Net>
List: current-users
Date: 07/04/2003 16:20:45
Greg A. Woods wrote:
> It is also different hardware. :-)
Hi Greg.
D'Arcy reports he has the exact same hardware running 1.6.1 at Givex on
several boxes, without issues... except one box. One of their boxes (same
hardware) *is* having problems to. The common denominator *seems* to be both
these boxes are running NFS... but we have not been able to confirm this is
the problem yet.
> Do your scripts also first write to /etc/ptmp (after carefully creating
> it with O_CREAT|O_EXCL) then run "pwd_mkdb -p /etc/ptmp" (i.e. in the
Of course not. You know us better than that. <-; Actually we don't write to
/etc/ptmp on Vex, different filename...
> (it might not be such a bad idea to add an flock() call for /etc/ptmp to
Yeah, that's a good idea. I've long meant to rewrite the scripts on Vex to
work on a fundementally different way which would avoid all the potential
problems... unique temp files... store system logins in a seperate file
rather than the one being overwritten... etc.. but you know... haven't got
there yet.
I can't see how this problem has anything to do with scripting though. These
scripts have run forever... they may have certain design problems... but any
problems they've had, we're well familiar with.
> Were any of the auto-updated files corrupt in a crash after having
> disabled the cron jobs?
The only files that have been corrupted by the lock up that we have found are
master.passwd (this corruption would happen only for aobut 50% of the
crashes) and once I believe /etc/group was messed up.
> I would lean more towards it being a hardware problem....
What hardware would you suggest? Drive? Conroller? It seems rather an
incredible coincidence that this hardware would fail just when we upgraded to
1.6.1 when it was all running fine with 1.5.3 for so long.
We were, however, having SCSI contoller problems when we first upgraded. 1.5.3
ran perfectly (no hardware changes), but 1.6.0 and 1.6.1 releases would not
work with the onboard STL2 scsi controller. We disabled it eventually in the
BIOS and put in an adaptec card. However current seems to have fixed the
problems with the scsi driver, and we've moved back to the on-board
controller at the moment (trying basically anything to stop the crashing...
we even have contemplated taking scsi out of the equation by dropping in IDE
drives. <-:) Personally, at this point, despite reluctance, I'm wanting to
just go back to 1.5.3 which was stable for us. I'm quite sure these problems
will disappear if we do (and if they don't, then I'd think it was
hardware)... but others (not mentioning any names <-:) are against this for
various reasons. So I'm fishing...
--
Tim Middleton | Cain Gang Ltd | I felt very much alone, so I took another
x@veX.net | www.Vex.Net | ginger-snap. --Greene (TWMA)