I've been experimenting with more complete install.img and cdrom ISOs, i.e. with complete tools and filesystems, and in doing so I've come up with (what I think is) a better way to expose and use a fully populated /var when the underlying disk media is read-only (which even install.img could be). So to do this my experimental installer's /etc/rc includes: mount -t tmpfs tmpfs /tmp mkdir /tmp/uvar mount -t union /tmp/uvar /var So far, so good. Everything looks exactly right, and I can manually create files in /var/tmp, for example. # df Filesystem 1K-blocks Used Avail %Cap Mounted on /dev/xbd4a 2720338 2720338 0 100% / tmpfs 2016 956 1060 47% /dev tmpfs 3904336 8 3904328 0% /tmp tmpfs 3904336 8 3904328 0% /etc <above>:/tmp/uvar 6624672 2720344 3904328 41% /var # mount /dev/xbd4a on / type cd9660 (read-only, local) tmpfs on /dev type tmpfs (union, local) tmpfs on /tmp type tmpfs (local) tmpfs on /etc type tmpfs (union, local) <above>:/tmp/uvar on /var type union (local) (as a side note, note that /var has to be a full union filesystem, not just a union option to mount, since "the union option affects the file system name space only at the mount point itself; it does not apply recursively to subdirectories", but of course /var has several subdirectories, some/most of which need to be writable. /dev might also need to be a real union filesystem too, not just a union mount, since it too contains sub-directories, i.e. just in case anything needs to be created within one of them) But I then find some odd error messages from init on the console: init: can't add utmpx record for `system boot': Bad file descriptor init: can't add utmpx record for `runlevel': Bad file descriptor init: can't add utmpx record for `console': Bad file descriptor Note you can ignore the EBADF -- that's from an over-written errno (as far as I can see). The underlying errno is (as would be expected) EROFS (and this is confirmed with ktrace on related tools accessing the same file, e.g. who, which BTW, when run as root will write to an empty utmpx file!). Init reports these errors from calls to pututxline(), which calls getutxent(), and that tries to open the utmpx file with fopen("re+") (and then it tries "we+", before giving up and using "re"). Under the hood the first two translate, of course, into open(O_RDWR|otherbits) (as '+' always upgrades the open mode to O_RDWR). So, this is surprising! (see below for why, if it's not already obvious) The shell aso can't write to, or truncate, files from the underlying filesystem either: # echo -n >> /var/run/utmpx sh: cannot create /var/run/utmpx: read-only file system # echo -n > /var/run/utmpx sh: cannot create /var/run/utmpx: read-only file system Also very surprising, at least to me! Curiously touch(1) will cause the file to be mirrored in the upper layer, yet it only calls utimensat(2) (not open())! Now once the file is mirrored in the upper layer then init, who, etc. (and shell redirection to the utmpx file) all open the new writable union copy and work without complaint. Note that new unique files (without any underlying read-only original) are, as expected, created without any problem (which is why the example of /var/obj from mount_union(8) works as advertised). So I think this is surprising because mount_union(8) says: Requests to create or modify objects in uniondir are passed to the upper layer with the exception of a few special cases. An attempt to open for writing a file which exists in the lower layer causes a copy of the entire file to be made to the upper layer, and then for the upper layer copy to be opened. Similarly, an attempt to truncate a lower layer file to zero length causes an empty file to be created in the upper layer. Any other operation which would ultimately require modification to the lower layer fails with EROFS. To me it seems as if the claimed behaviour of "an attempt to open a for writing a file which exists in the lower layer" is failing! (assuming O_RDWR is such an attempt) and also given what the shell reports in my example above it seems "an attempt to truncate a lower layer file" is also failing. Furthermore although I would actually argue that utimensat() is also "an attempt to open for writing" (in the moral sense, especially since the original idea of touch(1) was to read a byte and write it back), the implementation is, strictly speaking, not an open() at all and perhaps should really be in the category of "any other operation" and thus actually be failing! So are these things I find surprising actually bugs, or am I confused by what mount_union(8) is vaguely saying? For the record this is with a (slightly dated) 9.99.81 kernel on amd64. -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgp2rKUoS2hK4.pgp
Description: OpenPGP Digital Signature