NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58553: ffs: garbage data appended after crash



The following reply was made to PR kern/58553; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: campbell+netbsd%mumble.net@localhost
Cc: gnats-bugs%netbsd.org@localhost
Subject: Re: kern/58553: ffs: garbage data appended after crash
Date: Mon, 05 Aug 2024 01:51:02 +0700

     Date:        Sun,  4 Aug 2024 15:30:01 +0000 (UTC)
     From:        campbell+netbsd%mumble.net@localhost
     Message-ID:  <20240804153001.C637C1A923F%mollari.NetBSD.org@localhost>
 
 
   | 1. start a write-heavy workload
 
 That's not necessarily needed ... I've seen cases where this kind of
 thing happens on an almost idle system, where metadata updates were
 all done, but data hadn't been written to files when the system crashed
 (sudden complete power loss I think it was) about 12 hours after the
 data had been written.    The data writes depend upon something in the
 system bothering to do them, and while if you have a write-heavy workload
 that's likely to not take too long, if you don't, it can sometimes be
 a very long time.
 
 In my case I could easily tell as the data that was "lost" (not really,
 I had copies) was incoming e-mail - the mail files all looked to be there,
 had appropriate modify times, sizes, etc, but garbage contents.
 [Since then I have my own replacement for update(8) running all the time!]
 
   | >Fix:
 
 I doubt that can be called a fix.   A hack which might work around some
 of the issues - perhaps the most common case - but not a fix.
 
 Two major issues I can see .. first, nothing in your proposal covers
 the case of data overwrites, where the metadata (other than the mtime)
 isn't being altered at all, but several blocks of data are being written
 somewhere in the middle of a file - some of those might be written, and
 others not, leading to garbage in the file which is neither its before
 nor intended after state.   Your "at the end" case is just the common
 case of that, but to be considered a fix, all of it would need fixing.
 
 And:
 
   | 3. Change ffs_fsync and ffs_full_fsync so that if they are syncing any
   | prefix of the interval [k0, k1],
 
 And if not syncing a prefix - but some data in the middle?   Easy to just
 not update things in that case, but sometime later, when the earlier part
 of the interval has been written, the record would need to grow all of these
 other blocks, as they won't happen again.   The typical solution to that is
 to split the record into two on any write to a segment in the interval, one
 for what is still to come before, and one for what comes after, omitting
 either, or both, of those if empty.   In hard cases that can deteriorate
 into a real mess.
 
 However:
 
   | (We can also use truncate(n,k) records to make truncate itself atomic
 
 that one probably would be a benefit, though whether it is sufficiently
 useful to add this extra mechanism, and forgo backward compat, I doubt.
 
 After all, everything needed to finish a truncate is in the metadata, if
 the size says the file should be 100 bytes, and there are blocks allocated
 beyond that, those can easily be removed during file system cleanup, after
 the crash.   That is, we can deduce what was happening from the state
 that remains.
 
 This is all much much harder than it looks.   If we really believe some
 kind of better method is needed, we should probably bribe Kirk to come
 and make softdeps work in NetBSD.   Not that even that is a full solution,
 data corruption after a crash is extremely hard to avoid without doing
 fully synchronous (all the way to the flash or platter) I/O - which is
 not something most people would tolerate most of the time.
 
 kre
 


Home | Main Index | Thread Index | Old Index