Subject: Re: large inode numbers
To: David Laight <david@l8s.co.uk>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/16/2003 18:17:39
--3siQDZowHQqNOShm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Dec 17, 2003 at 12:26:15AM +0000, David Laight wrote:
> On Tue, Dec 16, 2003 at 05:20:33PM +0100, Jaromir Dolecek wrote:
> > Martin Husemann wrote:
> > > I think it's time to bite the bullet and version struct dirent to giv=
e us
> > > large inode numbers. See PR kern/23773 for a trigger.
> > >=20
> > > Why didn't we do this a long time ago?
> >=20
> > Even today it's not ordinary to have filesystem which would
> > require 4G inodes. I'm not sure what's the inode size
> > nowadays, but you'd need at least 2048GB (with 512B per inode)
> > just to store the inode data.

Jaromir: "XFS".

> Well the default is 4 fragments per inode, you only need FFSv2 for
> more than 2^31 fragments - so any filesystem that needs FFSv2 is
> close to needing more than 2^31 inodes.
> (or if 2^31 inodes is way plenty, why do we need 2^31 fragments?)
>=20
> If hacking struct dirent, add that inode use number (or whatever it is ca=
lled)
> then you nenver need to synchronously update directory entries..

Huh?

> > It _seems_ the msdosfs problem could be solved by different
> > fileid calculation - for example using disk block number of first block
> > in a file as 'inode' number would work up to 2048GB at least (with
> > 512B block size). Maybe even beyond 2048GB - IIRC FAT32 can adress
> > 4G blocks maximum, so it would probably force bigger block size
> > for >2048GB filesystems (if it's supported at all).
>=20
> The file allocations are in multi-sector units. So you get >2048GB for fr=
ee.
> Empty files are a problem - the entire disk could be full of directory
> entries for zero sized files.
>=20
> OTOH do we guarantee unique inode numbers for all filesystems?
> IIRC union mounts done generate unique numbers - and mkisofs can get conf=
used.

Nit: there are union mounts, and there is unionfs. They are different.

unionfs does not guarantee unique inode numbers, as it can't. Files either=
=20
exist in the upper or the lower file system (as seen through unionfs), so=
=20
to have unique inode numbers, the two file systems would have to be=20
coordinated.

More damaging to the concept of unique inode numbers is the fact that the
inode number of a file can change. If it is only on the lower file system,
then it is written, it will be moved up to the upper file system. The two=
=20
versions of the file can have different inode numbers, so stat(2) calls at=
=20
different times will give different answers.

Take care,

Bill

--3siQDZowHQqNOShm
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQE/37zDWz+3JHUci9cRAn84AJ0TC7MtSAAf5J02S05a8+C2z83iBwCeK8Jh
IH9/QjtQIfWP6aLs3vF4PMM=
=UgHl
-----END PGP SIGNATURE-----

--3siQDZowHQqNOShm--