Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/trunk]: src/sys/ufs Various minor LFS improvements:



details:   https://anonhg.NetBSD.org/src/rev/ac03fa3eca05
branches:  trunk
changeset: 574261:ac03fa3eca05
user:      perseant <perseant%NetBSD.org@localhost>
date:      Sat Feb 26 05:40:42 2005 +0000

description:
Various minor LFS improvements:

* Note when lfs_putpages(9) thinks it is not going to be writing any
  pages before calling genfs_putpages(9).  This prevents a situation in
  which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
  overestimate in most cases.  Note that if NRESERVE() is too high, it
  may be impossible to create files on the filesystem.  We catch this
  case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
  entries in indirect blocks again, triggering a failed assertion "daddr
  <= LFS_MAX_DADDR".  Explicitly convert to and from int32_t to correct
  this.
* Add a high-water mark for the number of dirty pages any given LFS can
  hold before triggering a flush.  This is settable by sysctl, but off
  (zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
  shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
  even though their v_size == 0.  Don't panic when we see this.
* Change lfs_bfree to a signed quantity.  The manner in which it is
  processed before being passed to the cleaner means that sometimes it
  may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
  lfs_statvfs(9).  This prevents df(1) from ever telling us that our full
  filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
  associated buffer headers, so that the pagedaemon doesn't run us out
  of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
  unmounted.  Because vfs_busy() is a shared lock, and
  lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
  holding the lock that umount() is blocking on, then try to vfs_busy()
  again in getnewvnode().

diffstat:

 sys/ufs/lfs/TODO            |    9 +--
 sys/ufs/lfs/lfs.h           |   63 +++++++++++++----
 sys/ufs/lfs/lfs_alloc.c     |    9 +-
 sys/ufs/lfs/lfs_balloc.c    |  109 +++++++++++++++++++++++++++++++-
 sys/ufs/lfs/lfs_bio.c       |  102 ++++++++++++++++++++---------
 sys/ufs/lfs/lfs_extern.h    |   14 +++-
 sys/ufs/lfs/lfs_segment.c   |   17 ++--
 sys/ufs/lfs/lfs_subr.c      |   21 ++++-
 sys/ufs/lfs/lfs_syscalls.c  |   24 ++++++-
 sys/ufs/lfs/lfs_vfsops.c    |  150 +++++++++++++++++++++++++++++++------------
 sys/ufs/lfs/lfs_vnops.c     |   44 +++++++++--
 sys/ufs/ufs/ufs_readwrite.c |    5 +-
 12 files changed, 433 insertions(+), 134 deletions(-)

diffs (truncated from 1344 to 300 lines):

diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/TODO
--- a/sys/ufs/lfs/TODO  Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/TODO  Sat Feb 26 05:40:42 2005 +0000
@@ -1,11 +1,7 @@
-#   $NetBSD: TODO,v 1.7 2003/02/23 00:22:33 perseant Exp $
+#   $NetBSD: TODO,v 1.8 2005/02/26 05:40:42 perseant Exp $
 
 - Lock audit.  Need to check locking for multiprocessor case in particular.
 
-- Get rid of the syscalls: make them into ioctl calls instead.  This would
-  allow LFS to be loaded as a module.  We would then ideally have an
-  in-kernel cleaner that runs if no userland cleaner has asserted itself.
-
 - Get rid of lfs_segclean(); the kernel should clean a dirty segment IFF it
   has passed two checkpoints containing zero live bytes.
 
@@ -23,9 +19,6 @@
   locking problem in lfs_{bmapv,markv} goes away and lfs_reserve can go,
   too.
 
-- Fully working fsck_lfs.  (Really, need a general-purpose external
-  partial-segment writer.)
-
 - Get rid of DEV_BSIZE, pay attention to the media block size at mount time.
 
 - More fs ops need to call lfs_imtime.  Which ones?  (Blackwell et al., 1995)
diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/lfs.h
--- a/sys/ufs/lfs/lfs.h Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/lfs.h Sat Feb 26 05:40:42 2005 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: lfs.h,v 1.74 2004/08/14 14:32:04 mycroft Exp $ */
+/*     $NetBSD: lfs.h,v 1.75 2005/02/26 05:40:42 perseant Exp $        */
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -77,6 +77,7 @@
 #define LFS_DEBUG_RFW          /* print roll-forward debugging info */
 #define LFS_LOGLENGTH      1024 /* size of debugging log */
 #define LFS_MAX_ACTIVE    10   /* Dirty segments before ckp forced */
+#define LFS_PD                 /* pagedaemon codaemon */
 
 /* #define DEBUG_LFS */                 /* Intensive debugging of LFS subsystem */
 
@@ -111,9 +112,11 @@
 /* Resource limits */
 #define LFS_MAX_BUFS       ((nbuf >> 2) - 10)
 #define LFS_WAIT_BUFS      ((nbuf >> 1) - (nbuf >> 3) - 10)
-extern u_long bufmem; /* XXX */
-#define LFS_MAX_BYTES      ((bufmem >> 2) - 10 * PAGE_SIZE)
-#define LFS_WAIT_BYTES     ((bufmem >> 1) - (bufmem >> 3) - 10 * PAGE_SIZE)
+#define LFS_INVERSE_MAX_BUFS(n) (((n) + 10) << 2)
+extern u_long bufmem_lowater, bufmem_hiwater; /* XXX */
+#define LFS_MAX_BYTES      ((bufmem_lowater >> 2) - 10 * PAGE_SIZE)
+#define LFS_INVERSE_MAX_BYTES(n) (((n) + 10 * PAGE_SIZE) << 2)
+#define LFS_WAIT_BYTES     ((bufmem_lowater >> 1) - (bufmem_lowater >> 3) - 10 * PAGE_SIZE)
 #define LFS_MAX_DIROP      ((desiredvnodes >> 2) + (desiredvnodes >> 3))
 #define LFS_MAX_PAGES \
      (((uvmexp.active + uvmexp.inactive + uvmexp.free) * uvmexp.filemin) >> 8)
@@ -121,7 +124,6 @@
      (((uvmexp.active + uvmexp.inactive + uvmexp.free) * uvmexp.filemax) >> 8)
 #define LFS_BUFWAIT        2   /* How long to wait if over *_WAIT_* */
 
-
 /*
  * Reserved blocks for lfs_malloc
  */
@@ -466,7 +468,7 @@
 typedef struct _cleanerinfo {
        u_int32_t clean;                /* number of clean segments */
        u_int32_t dirty;                /* number of dirty segments */
-       u_int32_t bfree;                /* disk blocks free */
+       int32_t   bfree;                /* disk blocks free */
        int32_t   avail;                /* disk blocks available */
        u_int32_t free_head;            /* head of the inode free list */
        u_int32_t free_tail;            /* tail of the inode free list */
@@ -487,9 +489,11 @@
 /* Synchronize the Ifile cleaner info with current avail and bfree */
 #define LFS_SYNC_CLEANERINFO(cip, fs, bp, w) do {                      \
     if ((w) || (cip)->bfree != (fs)->lfs_bfree ||                      \
-       (cip)->avail != (fs)->lfs_avail - (fs)->lfs_ravail) {           \
+       (cip)->avail != (fs)->lfs_avail - (fs)->lfs_ravail -            \
+       (fs)->lfs_favail) {                                             \
        (cip)->bfree = (fs)->lfs_bfree;                                 \
-       (cip)->avail = (fs)->lfs_avail - (fs)->lfs_ravail;              \
+       (cip)->avail = (fs)->lfs_avail - (fs)->lfs_ravail -             \
+               (fs)->lfs_favail;                                       \
        if (((bp)->b_flags & B_GATHERED) == 0)                          \
                (fs)->lfs_flags |= LFS_IFDIRTY;                         \
        (void) LFS_BWRITE_LOG(bp); /* Ifile */                          \
@@ -590,7 +594,7 @@
 
 /* Checkpoint region. */
        u_int32_t dlfs_freehd;    /* 32: start of the free list */
-       u_int32_t dlfs_bfree;     /* 36: number of free disk blocks */
+       int32_t   dlfs_bfree;     /* 36: number of free disk blocks */
        u_int32_t dlfs_nfiles;    /* 40: number of allocated inodes */
        int32_t   dlfs_avail;     /* 44: blocks available for writing */
        int32_t   dlfs_uinodes;   /* 48: inodes in cache not yet on disk */
@@ -750,6 +754,7 @@
        pid_t lfs_rfpid;                /* Process ID of roll-forward agent */
        int       lfs_nadirop;          /* number of active dirop nodes */
        long      lfs_ravail;           /* blocks pre-reserved for writing */
+       long      lfs_favail;           /* blocks pre-reserved for writing */
        res_t *lfs_resblk;              /* Reserved memory for pageout */
        TAILQ_HEAD(, inode) lfs_dchainhd; /* dirop vnodes */
        TAILQ_HEAD(, inode) lfs_pchainhd; /* paging vnodes */
@@ -767,6 +772,7 @@
        int      lfs_cleanind;  /* Index into intervals */
        struct simplelock lfs_interlock;  /* lock for lfs_seglock */
        int lfs_sleepers;               /* # procs sleeping this fs */
+       int lfs_pages;                  /* dirty pages blaming this fs */
 };
 
 /* NINDIR is the number of indirects in a file system block. */
@@ -899,20 +905,34 @@
 #endif /* _KERNEL */
 
 /*
- * LFS inode extensions; moved from <ufs/ufs/inode.h> so that file didn't
- * have to change every time LFS changed.
+ * List containing block numbers allocated through lfs_balloc.
+ */
+struct lbnentry {
+       LIST_ENTRY(lbnentry) entry;
+       daddr_t lbn;
+};
+
+/*
+ * LFS inode extensions.
  */
 struct lfs_inode_ext {
        off_t     lfs_osize;            /* size of file on disk */
        u_int32_t lfs_effnblocks;  /* number of blocks when i/o completes */
        size_t    lfs_fragsize[NDADDR]; /* size of on-disk direct blocks */
-       TAILQ_ENTRY(inode) lfs_dchain; /* Dirop chain. */
-       TAILQ_ENTRY(inode) lfs_pchain; /* Paging chain. */
+       TAILQ_ENTRY(inode) lfs_dchain;  /* Dirop chain. */
+       TAILQ_ENTRY(inode) lfs_pchain;  /* Paging chain. */
+       /* Blocks allocated for write */
+#define LFS_BLIST_HASH_WIDTH 17
+       LIST_HEAD(, lbnentry) lfs_blist[LFS_BLIST_HASH_WIDTH];
+#define LFSI_NO_GOP_WRITE 0x01
+       u_int32_t lfs_iflags;           /* Inode flags */
 };
 #define i_lfs_osize            inode_ext.lfs->lfs_osize
 #define i_lfs_effnblks         inode_ext.lfs->lfs_effnblocks
 #define i_lfs_fragsize         inode_ext.lfs->lfs_fragsize
 #define i_lfs_dchain           inode_ext.lfs->lfs_dchain
+#define i_lfs_blist            inode_ext.lfs->lfs_blist
+#define i_lfs_iflags           inode_ext.lfs->lfs_iflags
 
 /*
  * Macros for determining free space on the disk, with the variable metadata
@@ -927,7 +947,7 @@
 #define LFS_EST_NONMETA(F) ((F)->lfs_dsize - (F)->lfs_dmeta - LFS_EST_CMETA(F))
 
 /* Estimate number of blocks actually available for writing */
-#define LFS_EST_BFREE(F) ((F)->lfs_bfree - LFS_EST_CMETA(F) - (F)->lfs_dmeta)
+#define LFS_EST_BFREE(F) ((F)->lfs_bfree > LFS_EST_CMETA(F) + (F)->lfs_dmeta ? (F)->lfs_bfree - LFS_EST_CMETA(F) - (F)->lfs_dmeta : 0)
 
 /* Amount of non-meta space not available to mortal man */
 #define LFS_EST_RSVD(F) (int32_t)((LFS_EST_NONMETA(F) *                             \
@@ -944,6 +964,13 @@
 #define IS_FREESPACE(F, BB)                                            \
          (LFS_EST_BFREE(F) >= (BB) + LFS_EST_RSVD(F))
 
+/*
+ * The minimum number of blocks to create a new inode.  This is:
+ * directory direct block (1) + NIADDR indirect blocks + inode block (1) +
+ * ifile direct block (1) + NIADDR indirect blocks = 3 + 2 * NIADDR blocks.
+ */
+#define LFS_NRESERVE(F) (btofsb((F), (2 * NIADDR + 3) << (F)->lfs_bshift))
+
 /* Statistics Counters */
 struct lfs_stats {
        u_int   segsused;
@@ -970,11 +997,15 @@
        int blkcnt;             /* number of blocks */
 };
 
-#define LFCNSEGWAITALL  _FCNW_FSPRIV('L', 0, struct timeval)
-#define LFCNSEGWAIT     _FCNW_FSPRIV('L', 1, struct timeval)
+#define LFCNSEGWAITALL  _FCNR_FSPRIV('L', 0, struct timeval)
+#define LFCNSEGWAIT     _FCNR_FSPRIV('L', 1, struct timeval)
 #define LFCNBMAPV      _FCNRW_FSPRIV('L', 2, struct lfs_fcntl_markv)
 #define LFCNMARKV      _FCNRW_FSPRIV('L', 3, struct lfs_fcntl_markv)
 #define LFCNRECLAIM     _FCNO_FSPRIV('L', 4)
+#define LFCNIFILEFH     _FCNW_FSPRIV('L', 5, struct fhandle)
+/* Compat for NetBSD 2.x error */
+#define LFCNSEGWAITALL_COMPAT   _FCNW_FSPRIV('L', 0, struct timeval)
+#define LFCNSEGWAIT_COMPAT      _FCNW_FSPRIV('L', 1, struct timeval)
 
 #ifdef _KERNEL
 /* XXX MP */
diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/lfs_alloc.c
--- a/sys/ufs/lfs/lfs_alloc.c   Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/lfs_alloc.c   Sat Feb 26 05:40:42 2005 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: lfs_alloc.c,v 1.73 2004/08/14 01:08:03 mycroft Exp $   */
+/*     $NetBSD: lfs_alloc.c,v 1.74 2005/02/26 05:40:42 perseant Exp $  */
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -67,7 +67,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_alloc.c,v 1.73 2004/08/14 01:08:03 mycroft Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_alloc.c,v 1.74 2005/02/26 05:40:42 perseant Exp $");
 
 #if defined(_KERNEL_OPT)
 #include "opt_quota.h"
@@ -422,9 +422,7 @@
        struct inode *ip;
        struct ufs1_dinode *dp;
        struct ufsmount *ump;
-#ifdef QUOTA
        int i;
-#endif
        
        /* Get a pointer to the private mount structure. */
        ump = VFSTOUFS(mp);
@@ -435,6 +433,9 @@
        dp = pool_get(&lfs_dinode_pool, PR_WAITOK);
        memset(dp, 0, sizeof(*dp));
        ip->inode_ext.lfs = pool_get(&lfs_inoext_pool, PR_WAITOK);
+       memset(ip->inode_ext.lfs, 0, sizeof(*ip->inode_ext.lfs));
+       for (i = 0; i < LFS_BLIST_HASH_WIDTH; i++)
+               LIST_INIT(&(ip->i_lfs_blist[i]));
        vp->v_data = ip;
        ip->i_din.ffs1_din = dp;
        ip->i_ump = ump;
diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/lfs_balloc.c
--- a/sys/ufs/lfs/lfs_balloc.c  Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/lfs_balloc.c  Sat Feb 26 05:40:42 2005 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: lfs_balloc.c,v 1.48 2004/01/25 18:06:49 hannken Exp $  */
+/*     $NetBSD: lfs_balloc.c,v 1.49 2005/02/26 05:40:42 perseant Exp $ */
 
 /*-
  * Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -67,7 +67,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_balloc.c,v 1.48 2004/01/25 18:06:49 hannken Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_balloc.c,v 1.49 2005/02/26 05:40:42 perseant Exp $");
 
 #if defined(_KERNEL_OPT)
 #include "opt_quota.h"
@@ -81,6 +81,7 @@
 #include <sys/mount.h>
 #include <sys/resourcevar.h>
 #include <sys/trace.h>
+#include <sys/malloc.h>
 
 #include <miscfs/specfs/specdev.h>
 
@@ -96,6 +97,8 @@
 
 int lfs_fragextend(struct vnode *, int, int, daddr_t, struct buf **, struct ucred *);
 
+u_int64_t locked_fakequeue_count;
+
 /*
  * Allocate a block, and to inode and filesystem block accounting for it
  * and for any indirect blocks the may need to be created in order for
@@ -162,6 +165,10 @@
        if (bpp)
                *bpp = NULL;
        
+       /* Bomb out immediately if there's no space left */
+       if (fs->lfs_bfree <= 0)
+               return ENOSPC;
+
        /* Check for block beyond end of file and fragment extension needed. */
        lastblock = lblkno(fs, ip->i_size);
        if (lastblock < NDADDR && lastblock < lbn) {
@@ -227,6 +234,10 @@
        error = ufs_bmaparray(vp, lbn, &daddr, &indirs[0], &num, NULL, NULL);
        if (error)
                return (error);
+
+       daddr = (daddr_t)((int32_t)daddr); /* XXX ondisk32 */
+       KASSERT(daddr <= LFS_MAX_DADDR);
+
        /*
         * Do byte accounting all at once, so we can gracefully fail *before*
         * we start assigning blocks.
@@ -295,6 +306,12 @@
        if (bpp)
                *bpp = bp = getblk(vp, lbn, blksize(fs, ip, lbn), 0, 0);
        
+       /*
+        * Do accounting on blocks that represent pages.
+        */
+       if (!bpp)
+               lfs_register_block(vp, lbn);
+
        /* 



Home | Main Index | Thread Index | Old Index