tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: FFS: wrong superblock check ~> crash



rmind%netbsd.org@localhost (Mindaugas Rasiukevicius) writes:

>If I have a system with an array of disks and one of them fails, a crash
>would take down the whole node.  When many terabytes of data suddenly
>disappear people get unhappy;

When a single node going down makes people unhappy, you may want to have
more than one :)

>it usually costs quite a bit of money too.
>So, how about making only the *failed* segment of data unavailable?

It all depends on what you can or are willing to handle. Also what a
crash (and reboot) would actually solve. It doesn't help if your
boot partition is damaged and you force a reboot, converting a
maybe half working system into a dead system.

In my use case, a broken filesystem is usually a sign of an unnoticed
hardware or software error and the best reaction to recover is to
panic (and throw the data away, the machines have only temporary
data). Continuing with a read-only filesystem doesn't do any good,
because you have no means to find out wether the data you can read
is complete or correct.

Most clustered systems also handle complete outages better than
a degraded mode. That's why you have things like STONITH.

Your workstation or laptop usually has different requirements.



Home | Main Index | Thread Index | Old Index