tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

How to go on with ISO 9660 large file support ?



Hi,

i now have running support for large data files in cd9660.
Shall we first discuss it here, or shall i submit a large PR which
motivates the change, lengthily assesses the situation, presents the
model change, and proposes a patch ? 

Overview:

The goal is to let cd9660 recognize files with multiple file sections
and represent their multiple directory records as a single vnode with
a uniform byte space.

There shall be no duplicate filenames presented to VFS. If files with
equal names are found, then only the last one of those with the highest
ISO 9660 version number will be visible.
This warranty depends on properly sorted ISO 9660 directories.
The decision, which intervals of directory records form single files
depends on properly set ISO 9660 Multi-Extent flag bits.


The filesystem specific vnode.v_data struct iso_node needed a change
to represent the 1:n relation between file and file section.
This change caused adjustments all over the code of cd9660.
It makes nearly full use of the 64 bits of NetBSD's ino_t and
employs malloc(9) or kmem(9) memory for files with more than one
section. (Currently 12 bytes per file section.)

ABI compatibility of the changed struct iso_node is guaranteed for
systems with up to 96 bit pointer types.
The API of cd9660_node.h is not compatible, in my current
implementation. An alternative implementation is possible with
100 % API/ABI compatibility for single-section files.
It would cause 4 to 8 bytes size increase of struct iso_node.

fstat(1) and pmap(1) include <isofs/cd9660/cd9660_node.h>.

Several implementations of interface methods are affected:

- cd9660_readdir() serving as VOP_READDIR(9)

  The case of mount -o norrip,nogens already used a delivery function
  with delayed file candidates: cd9660_vnops.c : iso_shipdir().
  Originally it only had the task to find the youngest version of a
  ISO 9660 data file. Now it also counts the follow-up directory
  records of the same file and skips over them.

- cd9660_lookup() as VOP_LOOKUP(9)

  mount -o norrip returned the last record of matching name,
  whereas -o rrip returned the first matching record.
  Now norrip,nogens with a healthy ISO 9660 filesystem drops only older
  versions of the same name.
  All three filesystem interpretation types now return the ino based
  on the byte address of first record of the winning file and on its
  number of file sections.

- cd9660_vget_internal() serving effectively as VFS_VGET(9)

  If the ino_t input parameter indicates a number of file sections larger
  than one, then the created iso_node gets malloced iso_node.iso_fsect.many
  as type M_ISOFSFSECT.
  The iso_node.i_number will indicate a file section count larger than 1
  only if such memory is attached to the iso_node and valid.

- cd9660_bmap() as VOP_BMAP(9)

  Nothing changes for files with a single file section.
  Those with more file sections need a loop to find the section which
  holds the desired block. Similar to the case of a single section, the
  last section will be base of the resulting block address, regardless
  whether its size includes the desired block.


The changes are tested by an ISO 9660 filesystem with large data file,
and examples for the Rock Ridge POSIX file types regular, directory,
block device, fifo, symbolic link. The test for block device
functionality makes it necessary to create it especially for a local
readable device file.
xorriso perception of Rock Ridge aspect:
  dr-x------    1 1000     0               0 May  6 15:31 '/'
  dr-x------    1 1000     0               0 May  3 14:58 '/dev'
  prw-------    1 1000     0               0 May 24 14:29 '/dev/test.fifo'
  br--------    1 1000     5            0,12 May 14 14:33 '/dev/wd1e'
  dr-x------    1 1000     1000            0 May  6 15:30 '/my'
  -r--------    1 1000     1000     4329375744 May  6 15:30 '/my/large_file'
  dr-x------    1 1000     0               0 Jan 19 14:41 '/reg'
  -r-x------    1 1000     0          133411 Jan 19 14:41 '/reg/tar'
  lr-x------    1 1000     0               0 May 24 14:29 '/reg/to_regfile' -> 
'tar'
  -r--------    1 1000     1000            6 May  6 15:34 '/small_file'
A script is available for creating, mounting, and testing this
filesystem. It is not trivially portable to other computers because
there are several individual adaptions to be made.
Its result varies between
  #######################
  # BAD TEST RESULTS: 8 #
  #######################
and
  +++++++++++++++
  + All is well +
  +++++++++++++++
I will publish the script and be ready to support its conversion
into an automatic test of what can be tested in general.

There is also an ISO image emerging which (by hex editor) exposes
exotic or even illegal situations. 6.1.3 and the host operating system
show interesting effects when mounting it.


Remaining restrictions:

- ISO 9660 allows a file to be composed of multiple file sections
  with sizes which are not aligned to the filesystem block size.
  cd9660 demands that all but the last file section of a file must
  have sizes which are multiples of the block size. Usually 2 KiB.

- cd9660 imposes a deliberate limit of 128 on the number of sections
  per file. CD9660_FSECT_MAX can be adjusted in cd9660_node.h.

Remaining problems:

- The name comparison for finding identical names is still not
  fully in sync underneath VOP_READDIR(9) and VOP_LOOKUP(9).
  It is done by two different functions in cd9660_util.c where i see
  incompatibilities in cases of non-compliant version suffixes.

- I could not yet find ISO images or software which would provide
  test opportunities for ISO 9660 Associated Files or Extented
  Attributes (which are not related to getextattr(1)/extattr(9)).

- Some code paths are not as clear as they could be. I restricted
  myself to augmenting the existing code for the 1:n relation.
  Some code made use of the contrary assumption for shortcuts.
  I did not tackle the shortcuts but only the assumptions yet.
  So some shortcuts became quite curvy and not so short any more.

About the inode number inflation:

  Large data files get giant inode numbers, because the file section
  count is encoded above bit 48 of ino_t.

  The hardest reason why this information has to be encoded in ino_t
  is the desire to implement method VFS_VGET(9). If VOP_LOOKUP(9) would
  be the only method which leads to creation of a vnode, then the address
  and count could be stored in some other members of struct iso_node.
  A simple EOPNOTSUPP would open this path.

  One could cut inode numbers to 32 bit and then port the cd9660
  improvements to FreeBSD. (Not that freebsd-hackers would be much
  interested in cd9660.)


Have a nice day :)

Thomas



Home | Main Index | Thread Index | Old Index