Subject: Re: fsck & other things
To: None <kpneal@pobox.com>
From: None <emre@module.vsrc.uab.edu>
List: netbsd-users
Date: 03/29/2001 20:33:24
On Thursday 29 March 2001 20:09 US Central Time, Kevin P. Neal wrote:
> "Uh, don't do that!"
>
> Corruption isn't caused by fsck, corruption is found and fixed by
> it. The filesystem is a state machine and fsck examines it to
> determine the state it was in when the machine went down. Then fsck
> can correct the damage to bring the filesystem back into a
> consistent state. If you have hardware problems that prevent fsck
> from fixing things then priority number 1 is to fix those problems.


I know that, but fsck is the only instance where the disk times out.
Maybe I wasn't very specific, when the machine crashes for whatever
reason, and reboots, and when fsck starts checking the filesystem,
the timeouts happen during that phase.  Everything is fine during
normal operation, even when teh disks get under very heavy load.

The problem only happens when fsck tries to check the system at
bootup.  I'm too scared to try to run fsck manually while the
system is up and running, because I might have to reinstall if
the disks time out and I cant shut down cleanly.


> If fsck seems to trigger a crash then that crash can probably be
> triggered by find / -print. The nightly scripts do just that sort
> of find, so you are asking for trouble trying to run with a system
> that falls over on medium disk I/O.

They don't make the disks time out for some reason.  Like I said
above, the disks don't time out under heavy usage.

> The kernel filesystems assume that the disks are stable underneath
> and the filesystem is consistent. If you run the kernel with an
> inconsistent filesystem because you didn't fsck then you haven't
> solved anything. Worse, you've pushed problems off into a random
> point in the near future when you will be nailed with a kernel
> panic and the resulting loss of data.

Yeah I'm aware of that :(
I can't seem to find any other solution though.  I really don't want
to put linux on this machine either :(

> Never disable fsck unless you need to for a quick hack and you know
> *exactly* what you are doing. If you have to ask how to disable
> fsck on boot then you do not know *exactly* what you are doing.

So I have to figure out why it's timing out in the first place.  I posted
here earlier, and got some replies.  Some said it might be the powersupply
others said it might be the IDE cables.  I checked all of that, nothing
suspicious.  The only thing I can think of now is the speed.  Maybe the
disks are too fast for the motherboard/cpu or vice versa???  Is there
a way to slow down the disks in NetBSD? (like downgrade the DMA mode
or something)

Thanks