Subject: Re: FFS journal
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/10/2006 13:16:38
--xesSdrSSBC0PokLI
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jul 09, 2006 at 11:58:33AM +0200, Pawel Jakub Dawidek wrote:
> On Mon, Jul 03, 2006 at 06:47:43PM +0200, Pavel Cahyna wrote:
> > Hi,
> >=20
> > On Sun, Jul 02, 2006 at 07:59:50PM +0400, Kirill Kuvaldin wrote:
> > If an application unlinks a file which is opened, the file is not delet=
ed
> > until it is closed, until that it exists as unnamed. Now if the system
> > crashes after the unlink and before the close, the unnamed file is not
> > deleted and remains in the filesystem, taking up space. This is not a
> > problem in a non-journalling scenario, because after a crash fsck is run
> > and takes care of it. But a journalling filesystem should take this into
> > account.
>=20
> Maybe you guys will find my experience helpful. I'm working on gjournal
> (a block level journaling) for FreeBSD and I needed to solve this
> problem as well.
>=20
> My first solution to the problem was a magic .deleted/ directory, which
> was created on mount time. Now, when an object (file or directory) was
> removed, but still open, it wasn't really removed, but moved to
> .deleted/ directory. On close the object is removed from this directory.
> You need to ensure that such file/directory cannot be moved back to the
> file system. On system crash or a power failure all you need to do is to
> 'rm -rf .deleted' directory.
> It worked without problems, but it wasn't really nice, so I implemented
> another thing...

I actually think it's a good way to go. Let's all agree on how to find=20
this directory and just use it.

> When an object is removed, but still open, I increase two counters:
> 1. fs_unref - total number of unreferenced inodes in the file system
>    (stored in file system's super-block).
> 2. cg_unref - total number of unreferenced inodes in this cylinder
>    group.
> After a system crash or a power failure, I run faster fsck version,
> which scans only cylinder groups looking of cg_unref > 0. If it finds
> such cylinder group, it scans all its inodes looking for those with
> linkcnt =3D=3D 0. Then, it just free all its blocks and marks it as
> unallocated. Of course, because of the global fs_unref counter we don't
> have to scan the whole file system, but quit scanning if fs_unref goes
> to 0.

The concern I have with something like this is that you're adding new cg=20
and fs_ values. The problem I see with this is that AFAICT ffs doesn't=20
handle versioning very well. I'd rather we not add new fields if we can't=
=20
tell what fields are in use. :-|

Also, I much prefer the hidden directory idea as it directly indicates=20
what needs cleaning. If I have two unlinked files, I'd rather not read=20
half or 70% of the CGs to find the files to clean up. Think about life on=
=20
a multi-TB file system, and remember that each cg read will rpobably=20
trigger a seek, which takes time on the order of ms.

Put another way, chances are that each cg will have no unlinked files, so=
=20
a method that won't need us to read each cg will perform better.

Take care,

Bill

--xesSdrSSBC0PokLI
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFEsrWmWz+3JHUci9cRAukRAJ9uLde6Qy6+GcuoDi4905cFs6ddpgCfRsyF
UT9MgsMylITqZf9ecNddlZ8=
=blt7
-----END PGP SIGNATURE-----

--xesSdrSSBC0PokLI--