Re: FFS: wrong superblock check ~> crash

To: Mindaugas Rasiukevicius <rmind%netbsd.org@localhost>
Subject: Re: FFS: wrong superblock check ~> crash
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Mon, 20 Oct 2014 22:10:43 +0200

On Mon, Oct 20, 2014 at 08:44:21PM +0100, Mindaugas Rasiukevicius wrote:
> Huh?  If your use case is a single / partition for everything then sure.

I don't but other are equally important.

> I can actually extend my statement: the system should not crash neither
> in a case of corrupt file system nor a disk failure (which may or may not
> lead to corrupt file system).

detectable disk failures are handled by RAID in my context.
But I've also got the case of disks that would silently corrupt data.
In this case you want to stop operations ASAP because if you keep things
running you only make things worse (corrupting good data on other members
of the RAID).

> 
> If I have a system with an array of disks and one of them fails, a crash
> would take down the whole node.  When many terabytes of data suddenly
> disappear people get unhappy;

As they would if the partition they're working on suddently becomes read-only.
It may be less damaging to stop the server, even if only one of the
many partitions is affected (and if you have that many partitions,
maybe you should also have more servers)

> it usually costs quite a bit of money too.
> So, how about making only the *failed* segment of data unavailable?  In
> many cases this is suitable and desirable; consider distributed systems
> (you have redundancy)

then you can take the system down, no problem. It's even probably better
to take it down completely, than letting it run in degraded mode.

> or caches (where I can just discard the corrupted
> data segment and refetch it from the origin).

How can the cache know where the corruption comes from ?

Anywau I can't see existing software that could deal gracefully with the
filesystem they're working on becoming unavailable. remounting read-only
is the default linux behavior and I've been hit by this many times.
Not only the server was unavailable to users, but it created
other problems (like mails bouncing, nfs operations failing instead
of just waiting for the server to come back, etc) while the server ran
in degraded mode before someone could stop it.

> Meanwhile, I can replace
> the failed disk and/or rebuild the corrupt file system while experiencing
> only a limited disruption.
> 
> > If the corrupted filesystem is from a corrupted USB key then not panicing
> > is probably better; but 1) USB keys usually don't have ffs on them 2)
> > In such case it would be better to run the filesystem code in userland
> > anyway.
> 
> So now you make an assumption about file systems used on USB sticks?
> That does not matter.  Somebody can create a USB stick with manually
> handcrafted superblock to crash your machine or maybe even exploit it.
> To me, that constitutes a security vulnerability.

Of course he can. He can also change the firmware of the USB stick
for this purpose. That's why untrusted USB sticks should not be mounted
using the in-kernel filesystems (but the USB stack may be a problem too).

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--

References:
- Re: FFS: wrong superblock check ~> crash
  - From: Manuel Bouyer
- Re: FFS: wrong superblock check ~> crash
  - From: Taylor R Campbell
- Re: FFS: wrong superblock check ~> crash
  - From: Manuel Bouyer
- Re: FFS: wrong superblock check ~> crash
  - From: Mindaugas Rasiukevicius
- Re: FFS: wrong superblock check ~> crash
  - From: Manuel Bouyer
- Re: FFS: wrong superblock check ~> crash
  - From: Mindaugas Rasiukevicius

Prev by Date: Re: FFS: wrong superblock check ~> crash
Next by Date: Re: FFS: wrong superblock check ~> crash
Previous by Thread: Re: FFS: wrong superblock check ~> crash
Next by Thread: Re: FFS: wrong superblock check ~> crash
Indexes:

Home | Main Index | Thread Index | Old Index