NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/47231
The following reply was made to PR kern/47231; it has been noted by GNATS.
From: pedro martelletto <pedro%ambientworks.net@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/47231
Date: Wed, 25 Dec 2013 12:12:01 +0100
As mentioned previously, this problem is indeed caused by metadata
pointers making it to disk before the newly allocated data blocks that
they point to.
The issue is further aggravated by WAPBL, since there are situations
where the journal is pushed to disk while regular file data is not,
which means there is a higher probability that, upon log replay, the
pointers in the inode will be updated to reflect an ongoing allocation
at the time of the crash.
One way to circumvent the problem is to asynchronously push blocks in
FFS's write routine for the '!overwrite' case:
Index: ufs/ufs/ufs_readwrite.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_readwrite.c,v
retrieving revision 1.107
diff -u -r1.107 ufs_readwrite.c
--- ufs/ufs/ufs_readwrite.c 23 Jun 2013 07:28:37 -0000 1.107
+++ ufs/ufs/ufs_readwrite.c 25 Dec 2013 10:28:28 -0000
@@ -423,6 +423,14 @@
* XXXUBC simplistic async flushing.
*/
+ if (!overwrite && vp->v_mount && vp->v_mount->mnt_wapbl) {
+ mutex_enter(vp->v_interlock);
+ error = VOP_PUTPAGES(vp,
+ trunc_page(oldoff & fs->fs_bmask),
+ round_page(ufs_blkroundup(fs, uio->uio_offset)),
+ PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
+ }
+
#ifndef LFS_READWRITE
if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
mutex_enter(vp->v_interlock);
That is not an acceptable solution though, as it (unsurprisingly)
disrupts write clustering:
dd if=/dev/zero of=x bs=2048 count=1024k 0.13s user 3.87s system 88% cpu
4.493 total
dd if=/dev/zero of=x bs=2048 count=1024k 0.38s user 10.95s system 12% cpu
1:30.63 total
softdep's answer to this problem is to keep track of newly allocated
data blocks, and to zero the pointers in the inode if it happens to be
pushed to disk before the blocks have been written. We might have to do
something similar.
Alternatives that come to my mind are a) to have these data blocks added
to the journal, since they are needed to preserve file system integrity;
or b) ensure they are pushed to disk every time the journal is flushed.
None of them strike me as particularly appealing, though. Any ideas?
-p.
Home |
Main Index |
Thread Index |
Old Index