NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/48797: Scale down cd9660 inode numbers by a factor of 32



>Number:         48797
>Category:       kern
>Synopsis:       Scale down cd9660 inode numbers by a factor of 32
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Fri May 09 18:15:00 +0000 2014
>Originator:     Thomas Schmitt
>Release:        6.1.3
>Organization:
>Environment:
NetBSD netbsd 6.1.3 NetBSD 6.1.3 (GENERIC) i386
>Description:
This proposal depends on the fix for my previous bug report
kern/48787.

The inode numbers of sys/fs/cd9660 are computed from the byte
addresses of their ISO 9660 directory records. With single
session media, they are normally in the range of 36864 to a
few million.

But multi-session media have their directory records stored
in the youngest session. With DVD DL and BD media it is likely
that this session starts above 4 GiB.

After fixing the 32 bit rollover in cd9660_node.c:isodirino(),
and thus making my test ISO mountable at all, the inode numbers
are shown rolled-over by ls -i.

  netbsd# ls -li /mnt 
  total 4
  34578432 drwxr-xr-x  1 thomas  dbus  2048 May  6 15:30 my
  34574670 -rw-r--r--  1 thomas  dbus     6 May  6 15:34 small_file

whereas
  struct stat stbuf;
  ...
    printf("sizeof(ino_t) = %d\n", (int) sizeof(ino_t));
    ...
    ret = stat("/mnt/small_file", &stbuf);
    ...
    printf("/mnt/small_file , ino = %.f\n", (double) stbuf.st_ino);
reports

  sizeof(ino_t) = 8
  /mnt/small_file , ino = 4329541966

Most probably other callers of stat(2) are not prepared for
such large inode numbers either.

ISO 9660 prescribes a minimum size of 34 bytes for a directory
record. So the computed inode numbers would stay unique even
if divided by this number. [1]

A peculiar use of inode numbers inside fs/cd9660 makes it
necessary to losslessly encode a 2048-block address in an
inode number. So it seems best to shift by 5 bits rather than
making use of the full scale of 34.

This will keep the inode numbers in the 32 bit range up to
session starts (resp. media sizes) of 128 GiB.
There are recordable Blu-ray media scratching at this limit.

cd9660 could willfully roll over at 32 bit, if it would not
transport the block address of directory extents via the inode
number of the directory's iso_node.
  sys/fs/cd9660/cd9660_vfsops.c line 815:
                ip->iso_start = ino >> imp->im_bshift;
(It could memorize a global offset to make this computation
 with rolled-over inode numbers.)

--------------------------------------------------------------
Footnote [1]:

The most original observers of this fact known to me are
probably Sergey Vlasov and Eric Lammerts as mentioned in
  /usr/src/linux/fs/isofs/isofs.h

I herby affirm that i did not look at the implementation there
but only remembered the idea from code inspection years ago.
That was when i implemented RRIP inode numbers for hardlinks
in libisofs and wondered why they don't show up on Linux.
Only after implementation of my proposed macros, i did look up
the comment above isofs_get_ino() for proper attribution.

My macros are derived from the computation code found in
the NetBSD kernel source. Especially they are paranoid about
enforcing 64 bit word size. Linux takes effort to keep
intermediate states of the computation below 32 bit as long
as possible.

>How-To-Repeat:
The effect on directory records located above the 4 GiB limit
may be studied with the test image for kern/48787:

  http://scdbackup.webframe.org/large.iso.bz2

Just 4470 bytes, MD5 7d78dc3efaec8ea3f1801335329f410d.
It inflates to 4,329,897,984 bytes. Provided under BSD license.

>Fix:
My patch implements the shifting and proper reverse computation
by introducing macros for computation of inode numbers and a
macro for deducing the block number of the directory extent from
a directory's inode number.

The main risk with this change is that i might not have caught
all occasions where an inode number is computed. This could in
worst case yield alternating inode numbers of the same inode,
or number collisions in large directory trees.

The scaling can be disabled by changing the shift value in
cd9660_node.h from 5 to
  #define CD9660_COMPUTE_INO_SHIFT 0
This should yield the same computation as is currently implemented
in kernel sources.

So if suspicion arises that this change causes problems by a
forgotten non-scaling computation, it would be quite easy to verify.



Home | Main Index | Thread Index | Old Index