Subject: Re: experiments after recent LFS changes
To: Konrad Schroder <perseant@hhhh.org>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 05/03/2005 09:29:29
--E9SPMlsjsjqOlA3h
Content-Type: multipart/mixed; boundary="B7FF+pnjRCrp0ISR"
Content-Disposition: inline


--B7FF+pnjRCrp0ISR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, May 02, 2005 at 02:10:49PM +1000, Daniel Carosone wrote:
> moreover I can't yet reproduce *any* other problem with LFS.

I knew I'd be speaking too soon.. :)

> It's getting pretty slow for some of the writing processes at
> points, especially with both cleaners going the disk is doing a
> *lot* of reading, but it is Still Going!

Actually, it was more than slow.  These processes eventually all got
stuck in lfs_rese(rve?); they could be killed easily enough, which
then allowed other processes (like an ls) waiting on vnlock to
proceed.  However, any new writers quickly got stuck in the same place
- even after a full reboot and fsck_lfs. Something in this filesystem
is now damaged.

Mounting the filesystem -n, and manually starting a cleaner with -d
shows an endless loop; the messages aren't entirely clear and there
are several worrying basic-sanity type checks failing, but if I read
them correctly one of the segments (#411) seems to be marked for
coalescing, the cleaner starts a new cleaner child to do this, and
then both loop endlessly. =20

A sort | uniq -c of these messages is attached (for brevity), as is a
dumplfs -s 411. The longer form of these messages (with original
ordering) available on request, as is any other output or even the
full 10gb vnd image if needed.

If I separately start just a coalescing cleaner, it also seems to just
loop endlessly emitting the free() warning. Both fsck_lfs and dump_lfs
think the filesystem is fine, but the cleaner is unhappy with it. =20

I wondered whether running the two cleaners at once was a bad idea; I
ran a separate quick test overnight with just the default cleaner (and
to raw disk, not vnd) and a couple of postmarks.  That seemed to reach
the same point (postmark processes hung in lfs_rese), but when I ^C'd
the first one, the box instantly locked up solid.

--
Dan.
--B7FF+pnjRCrp0ISR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="dumplfs.out"
Content-Transfer-Encoding: quoted-printable

Master Superblock at 0x8:
    magic    0x70162     version  0x2         size     10485760 =20
    ssize    1048576     dsize    9961392     bsize    8192     =20
    fsize    1024        frag     8           minfree  10       =20
    inopb    8           ifpb     409         nindir   2048     =20
    nseg     10239       sepb     341         cleansz  1        =20
    segtabsz 31          segmask  0x0         segshift 0        =20
    bmask    0x1fff      bshift   13          ffmask   0x3FF    =20
    ffshift  10          fbmask   0x7         fbshift  3
    sushift  0           fsbtodb  1           cksum    0x5e05   =20
    nclean   2           dmeta    134349      minfreeseg 511      =20
    roll_id  0x760e0584  interleave 0         sumsize  1024     =20
    seg0addr 0           maxfilesize  0x400801018000
  Superblock disk addresses:
     0x8        0xffc00    0x1ff800   0x2ff400   0x3ff000   0x4fec00 =20
     0x5fe800   0x6fe400   0x7fe000   0x8fdc00 =20
  Checkpoint Info
    freehd   91          idaddr   0x6c2f69    ifile    1        =20
    uinodes  0           bfree    1371231     avail    -259946  =20
    nfiles   284696      lastseg  0x6c2400    nextseg  0x6c9800 =20
    curseg   0x6c2c00    offset   0x6c2f6a    serial   110520   =20
    tstamp   Mon May  2 15:56:40 2005

IFILE inode
    mode  o100600	nlink 1	uid   0	gid   0	size  6766592
    atime Thu Jan  1 10:00:00 1970
    mtime Thu Jan  1 10:00:00 1970
    ctime Thu Jan  1 10:00:00 1970
    inum  1
    Direct Addresses
	0x6c2eed	0x4a3f3a	0x13def2	0x40db5a	0x63b372	0x40cdea
	0x480029	0x480031	0x480039	0x63b37a	0x4a3e99	0x69a732
	0x6c2f61	0x0	0x0

IFILE contents
free_head 91
free_tail 122603
clean	2	dirty	10237
bfree	1371253	avail	-259820


SEGMENT 411 (Disk Address 0x66c00)
Segment Summary Info at 0x66c00
    next     0x67000	nfinfo   1	ninos    0	flags    --
    sumsum   0x360	datasum  0x2ad0	create   Mon May  2 12:40:52 2005
    roll_id  760e0584   serial   4276
    Inode addresses:
    FINFO for inode: 122536 version 1 nblocks 6 lastlength 8192
	4610	4611	4612	4613	4614	4615
Segment Summary Info at 0x66c31
    next     0x67000	nfinfo   1	ninos    0	flags    --
    sumsum   0x35ce	datasum  0x1c70	create   Mon May  2 12:40:52 2005
    roll_id  760e0584   serial   4277
    Inode addresses:
    FINFO for inode: 122536 version 1 nblocks 8 lastlength 8192
	4616	4617	4618	4619	4620	4621	4622	4623
Segment Summary Info at 0x66c72
    next     0x67000	nfinfo   1	ninos    0	flags    --
    sumsum   0x7859	datasum  0x51e4	create   Mon May  2 12:40:52 2005
    roll_id  760e0584   serial   4278
    Inode addresses:
    FINFO for inode: 122536 version 1 nblocks 8 lastlength 8192
	4624	4625	4626	4627	4628	4629	4630	4631
Segment Summary Info at 0x66cb3
    next     0x67000	nfinfo   5	ninos    4	flags    DC
    sumsum   0x71be	datasum  0x2f1b	create   Mon May  2 12:40:52 2005
    roll_id  760e0584   serial   4279
    Inode addresses:	0x66d0c {122536v1, 121959v1, 121968v1, 122005v1}
    FINFO for inode: 122536 version 1 nblocks 11 lastlength 8192
	4632	4633	4634	4635	4636	4637	4638	4639
	-4108	-2060	-2061
    FINFO for inode: 121959 version 1 nblocks 1 lastlength 2048
	0
    FINFO for inode: 121968 version 1 nblocks 1 lastlength 1024
	0
    FINFO for inode: 122005 version 1 nblocks 1 lastlength 1024
	0
    FINFO for inode: 122524 version 1 nblocks 0 lastlength 0
Segment Summary Info at 0x66d11
    next     0x67000	nfinfo   1	ninos    0	flags    --
    sumsum   0x230b	datasum  0xaa4	create   Mon May  2 12:40:52 2005
    roll_id  760e0584   serial   4280
    Inode addresses:
    FINFO for inode: 122524 version 1 nblocks 32 lastlength 8192
	4064	4065	4066	4067	4068	4069	4070	4071
	4072	4073	4074	4075	4076	4077	4078	4079
	4080	4081	4082	4083	4084	4085	4086	4087
	4088	4089	4090	4091	4092	4093	4094	4095
Segment Summary Info at 0x66e12
    next     0x67000	nfinfo   1	ninos    0	flags    --
    sumsum   0xd7bd	datasum  0xee32	create   Mon May  2 12:40:52 2005
    roll_id  760e0584   serial   4281
    Inode addresses:
    FINFO for inode: 122524 version 1 nblocks 61 lastlength 8192
	4096	4097	4098	4099	4100	4101	4102	4103
	4104	4105	4106	4107	4108	4109	4110	4111
	4112	4113	4114	4115	4116	4117	4118	4119
	4120	4121	4122	4123	4124	4125	4126	4127
	4128	4129	4130	4131	4132	4133	4134	4135
	4136	4137	4138	4139	4140	4141	4142	4143
	4144	4145	4146	4147	4148	4149	4150	4151
	4152	4153	4154	4155	4156

--B7FF+pnjRCrp0ISR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="lfsclean.lose.uniq"

   2 		blocks_written 0
   1 		seg_error      1
   1 		seg_error      29
   2 		segs_cleaned   0
   2 		segs_empty     0
   2 		util_sos        0.00
  30  (2 of 1850 segments available, avail = -259820, bfree = 1371253)
  29 clean_segment: 0 inodes 0 indirect -> 8192 bytes + 0 = 8192 total (to save 0)
 507 lfs_cleanerd in free(): warning: junk pointer, too high to make sense.
   1 lfs_cleanerd[1305]: 		avg util: 0.00 std dev:  0.000000
   1 lfs_cleanerd[1305]: 		util_tot        0.00
  29 lfs_cleanerd[1305]: Bad magic number: 0x0 instead of 0x61561
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:48 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:49 2005
   2 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:50 2005
   2 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:51 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:54 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:55 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:56 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:57 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:58 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:55:59 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:00 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:01 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:02 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:03 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:04 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:05 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:07 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:08 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:09 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:10 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:11 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:13 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:14 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:15 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:16 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:18 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:19 2005
   1 lfs_cleanerd[1305]: Cleaner Running  at Mon May  2 15:56:20 2005
   1 lfs_cleanerd[1305]: Cleaner starting on filesystem /mnt
  29 lfs_cleanerd[1305]: Cleaning segment 411 (of 1 choices)
  29 lfs_cleanerd[1305]: Current usage: maxrss=0, idrss=0, isrss=0
  29 lfs_cleanerd[1305]: Warning: invalid segment summary at 0x66d09
  29 lfs_cleanerd[1305]: add_segment failed segment 411: Undefined error: 0
  29 lfs_cleanerd[1305]: add_segment: lfs_segmapv failed for segment 411
  29 lfs_cleanerd[1305]: adding segment 411: contains 1024 bytes
  28 lfs_cleanerd[1305]: coalescing already in progress
   1 lfs_cleanerd[1305]: lfs_cleanerd:	blocks_read    0
   1 lfs_cleanerd[1469]: 		avg util: 0.00 std dev:  0.000000
   1 lfs_cleanerd[1469]: 		util_tot        0.00
   1 lfs_cleanerd[1469]: ino 1 markv 821 blocks
   1 lfs_cleanerd[1469]: ino 1 total discontinuity 127 (12402845) for 826 blocks
   1 lfs_cleanerd[1469]: lfs_cleanerd:	blocks_read    0
   1 lfs_cleanerd[1469]: new coalescing process, pid 1469

--B7FF+pnjRCrp0ISR--

--E9SPMlsjsjqOlA3h
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (NetBSD)

iD8DBQFCdrfYEAVxvV4N66cRAvUWAJwPAbFLSYAeONHoQlDnV3qwRoPHigCeL1du
JfzXfYL4fQcskuVggKA0E+A=
=if6U
-----END PGP SIGNATURE-----

--E9SPMlsjsjqOlA3h--