Re: kern/47231

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,jakllsch%kollasch.net@localhost
Subject: Re: kern/47231
From: pedro martelletto <pedro%ambientworks.net@localhost>
Date: Wed, 25 Dec 2013 11:15:01 +0000 (UTC)

The following reply was made to PR kern/47231; it has been noted by GNATS.

From: pedro martelletto <pedro%ambientworks.net@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/47231
Date: Wed, 25 Dec 2013 12:12:01 +0100

 As mentioned previously, this problem is indeed caused by metadata
 pointers making it to disk before the newly allocated data blocks that
 they point to.

 The issue is further aggravated by WAPBL, since there are situations
 where the journal is pushed to disk while regular file data is not,
 which means there is a higher probability that, upon log replay, the
 pointers in the inode will be updated to reflect an ongoing allocation
 at the time of the crash.

 One way to circumvent the problem is to asynchronously push blocks in
 FFS's write routine for the '!overwrite' case:

 Index: ufs/ufs/ufs_readwrite.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_readwrite.c,v
 retrieving revision 1.107
 diff -u -r1.107 ufs_readwrite.c
 --- ufs/ufs/ufs_readwrite.c    23 Jun 2013 07:28:37 -0000      1.107
 +++ ufs/ufs/ufs_readwrite.c    25 Dec 2013 10:28:28 -0000
 @@ -423,6 +423,14 @@
                 * XXXUBC simplistic async flushing.
                 */

 +              if (!overwrite && vp->v_mount && vp->v_mount->mnt_wapbl) {
 +                      mutex_enter(vp->v_interlock);
 +                      error = VOP_PUTPAGES(vp,
 +                          trunc_page(oldoff & fs->fs_bmask),
 +                          round_page(ufs_blkroundup(fs, uio->uio_offset)),
 +                          PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
 +              }
 +
   #ifndef LFS_READWRITE
                if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
                        mutex_enter(vp->v_interlock);

 That is not an acceptable solution though, as it (unsurprisingly)
 disrupts write clustering:

 dd if=/dev/zero of=x bs=2048 count=1024k  0.13s user 3.87s system 88% cpu 
4.493 total
 dd if=/dev/zero of=x bs=2048 count=1024k  0.38s user 10.95s system 12% cpu 
1:30.63 total

 softdep's answer to this problem is to keep track of newly allocated
 data blocks, and to zero the pointers in the inode if it happens to be
 pushed to disk before the blocks have been written. We might have to do
 something similar.

 Alternatives that come to my mind are a) to have these data blocks added
 to the journal, since they are needed to preserve file system integrity;
 or b) ensure they are pushed to disk every time the journal is flushed.

 None of them strike me as particularly appealing, though. Any ideas?

 -p.

Prev by Date: Re: lib/48478: rendering problem in nanosleep(2)
Next by Date: Re: bin/48480: vi seg-fault
Previous by Thread: bin/48480: vi seg-fault
Next by Thread: Re: kern/47231
Indexes:

Home | Main Index | Thread Index | Old Index