Re: fsync error reporting

To: tech-kern%netbsd.org@localhost
Subject: Re: fsync error reporting
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Fri, 19 Feb 2021 05:17:15 +0000

On Thu, Feb 18, 2021 at 11:00:15PM -0500, Mouse wrote:
 > > (3) I think the drawbacks of reporting user 1's I/O errors to user 2
 > > [...] mean that we should guarantee that I/O errors from *your*
 > > writes should be reported by *your* call to fsync.  [...]
 > 
 > > (3a) I don't think it's necessary to guarantee that I/O errors from
 > > other people's writes won't _also_ be reported by your fsync call,
 > > but I think any natural implementation that supports the prior
 > > guarantee will also have this property.
 > 
 > I'm not so sure.
 > 
 > A opens F
 > B opens F
 > A write()s 10 bytes at offset 10, thereby dirtying the block at offset 0
 > B write()s 10 bytes at offset 30, thereby dirtying the block at offset 0
 > Kernel now pushes the block at offset 0 and gets a hardware error
 > 
 > Now: which of the writes errored and thus should get an error at next
 > fsync?  A's?  B's?  Both?  If B's write is instead at offset 10, so it
 > completely overwrites A's, does that change your answer?
 > 
 > I think the only sane answer to the first question is "both".  This
 > then leaves open the question of how to ensure _both_ fsync()s error.
 > I don't see any particularly "natural" way to do that that won't also
 > report errors due to someone else's write.  Maybe I'm just missing
 > something.

Well, if both A and B update a block and then it errors on writeback,
both their writes failed, regardless of whether B overwrote A's data
entirely or not. So I think reporting to both is correct. If they
write to different blocks, though, they won't see each others' errors
under the scheme I proposed.

 > > everything that process wrote is on disk,
 > 
 > That is probably unattainable, since I've seen it plausibly asserted
 > that some disks lie, reporting that writes are on the media when this
 > is not actually true.

Indeed. What I meant to say is that everything has been sent to disk,
as opposed to being accidentally skipped in the cache because the
buffer was busy, which will currently happen on some of the fsync
paths.

That's why flushing the disk-level caches was a separate point.

 > However, I think the kernel can be excused for believing a lie from the
 > device in that regard, especially since there really isn't much
 > alternative.

Right.

 > > (8) I'm not convinced that there's any real value in reporting
 > > exactly what blocks failed.
 > 
 > Is there any interface to do so via?

No; we'd have to make one up, which doesn't seem worthwhile.

-- 
David A. Holland
dholland%netbsd.org@localhost

Follow-Ups:
- Re: fsync error reporting
  - From: Greg Troxel

References:
- fsync error reporting
  - From: David Holland
- Re: fsync error reporting
  - From: Mouse

Prev by Date: Re: fsync error reporting
Next by Date: Re: fsync error reporting
Previous by Thread: Re: fsync error reporting
Next by Thread: Re: fsync error reporting
Indexes:

Home | Main Index | Thread Index | Old Index