tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

mount_union(8) vs. open(O_RDWR)



I've been experimenting with more complete install.img and cdrom ISOs,
i.e. with complete tools and filesystems, and in doing so I've come up
with (what I think is) a better way to expose and use a fully populated
/var when the underlying disk media is read-only (which even install.img
could be).

So to do this my experimental installer's /etc/rc includes:

    mount -t tmpfs tmpfs /tmp
    mkdir /tmp/uvar
    mount -t union /tmp/uvar /var

So far, so good.  Everything looks exactly right, and I can manually
create files in /var/tmp, for example.

# df
Filesystem           1K-blocks         Used        Avail %Cap Mounted on
/dev/xbd4a             2720338      2720338            0 100% /
tmpfs                     2016          956         1060  47% /dev
tmpfs                  3904336            8      3904328   0% /tmp
tmpfs                  3904336            8      3904328   0% /etc
<above>:/tmp/uvar      6624672      2720344      3904328  41% /var
# mount
/dev/xbd4a on / type cd9660 (read-only, local)
tmpfs on /dev type tmpfs (union, local)
tmpfs on /tmp type tmpfs (local)
tmpfs on /etc type tmpfs (union, local)
<above>:/tmp/uvar on /var type union (local)

(as a side note, note that /var has to be a full union filesystem, not
just a union option to mount, since "the union option affects the file
system name space only at the mount point itself; it does not apply
recursively to subdirectories", but of course /var has several
subdirectories, some/most of which need to be writable.  /dev might also
need to be a real union filesystem too, not just a union mount, since it
too contains sub-directories, i.e. just in case anything needs to be
created within one of them)

But I then find some odd error messages from init on the console:

    init: can't add utmpx record for `system boot': Bad file descriptor
    init: can't add utmpx record for `runlevel': Bad file descriptor
    init: can't add utmpx record for `console': Bad file descriptor

Note you can ignore the EBADF -- that's from an over-written errno (as
far as I can see).  The underlying errno is (as would be expected) EROFS
(and this is confirmed with ktrace on related tools accessing the same
file, e.g. who, which BTW, when run as root will write to an empty utmpx
file!).  Init reports these errors from calls to pututxline(), which
calls getutxent(), and that tries to open the utmpx file with
fopen("re+") (and then it tries "we+", before giving up and using "re").
Under the hood the first two translate, of course, into
open(O_RDWR|otherbits) (as '+' always upgrades the open mode to O_RDWR).

So, this is surprising!  (see below for why, if it's not already obvious)

The shell aso can't write to, or truncate, files from the underlying
filesystem either:

    # echo -n >> /var/run/utmpx
    sh: cannot create /var/run/utmpx: read-only file system
    # echo -n > /var/run/utmpx
    sh: cannot create /var/run/utmpx: read-only file system

Also very surprising, at least to me!

Curiously touch(1) will cause the file to be mirrored in the upper
layer, yet it only calls utimensat(2) (not open())!

Now once the file is mirrored in the upper layer then init, who,
etc. (and shell redirection to the utmpx file) all open the new writable
union copy and work without complaint.

Note that new unique files (without any underlying read-only original)
are, as expected, created without any problem (which is why the example
of /var/obj from mount_union(8) works as advertised).

So I think this is surprising because mount_union(8) says:

     Requests to create or modify objects in uniondir are passed to the upper
     layer with the exception of a few special cases.  An attempt to open for
     writing a file which exists in the lower layer causes a copy of the
     entire file to be made to the upper layer, and then for the upper layer
     copy to be opened.  Similarly, an attempt to truncate a lower layer file
     to zero length causes an empty file to be created in the upper layer.
     Any other operation which would ultimately require modification to the
     lower layer fails with EROFS.

To me it seems as if the claimed behaviour of "an attempt to open a for
writing a file which exists in the lower layer" is failing!  (assuming
O_RDWR is such an attempt) and also given what the shell reports in my
example above it seems "an attempt to truncate a lower layer file" is
also failing.  Furthermore although I would actually argue that
utimensat() is also "an attempt to open for writing" (in the moral
sense, especially since the original idea of touch(1) was to read a byte
and write it back), the implementation is, strictly speaking, not an
open() at all and perhaps should really be in the category of "any other
operation" and thus actually be failing!

So are these things I find surprising actually bugs, or am I confused by
what mount_union(8) is vaguely saying?

For the record this is with a (slightly dated) 9.99.81 kernel on amd64.

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgp2rKUoS2hK4.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index