tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: mount_union(8) vs. open(O_RDWR)



> On 2. Dec 2021, at 11:55, Greg A. Woods <woods%planix.ca@localhost> wrote:
> 
> I've been experimenting with more complete install.img and cdrom ISOs,
> i.e. with complete tools and filesystems, and in doing so I've come up
> with (what I think is) a better way to expose and use a fully populated
> /var when the underlying disk media is read-only (which even install.img
> could be).
> 
> So to do this my experimental installer's /etc/rc includes:
> 
>    mount -t tmpfs tmpfs /tmp
>    mkdir /tmp/uvar
>    mount -t union /tmp/uvar /var
> 
> So far, so good.  Everything looks exactly right, and I can manually
> create files in /var/tmp, for example.
> 
> # df
> Filesystem           1K-blocks         Used        Avail %Cap Mounted on
> /dev/xbd4a             2720338      2720338            0 100% /
> tmpfs                     2016          956         1060  47% /dev
> tmpfs                  3904336            8      3904328   0% /tmp
> tmpfs                  3904336            8      3904328   0% /etc
> <above>:/tmp/uvar      6624672      2720344      3904328  41% /var
> # mount
> /dev/xbd4a on / type cd9660 (read-only, local)
> tmpfs on /dev type tmpfs (union, local)
> tmpfs on /tmp type tmpfs (local)
> tmpfs on /etc type tmpfs (union, local)
> <above>:/tmp/uvar on /var type union (local)
> 
> (as a side note, note that /var has to be a full union filesystem, not
> just a union option to mount, since "the union option affects the file
> system name space only at the mount point itself; it does not apply
> recursively to subdirectories", but of course /var has several
> subdirectories, some/most of which need to be writable.  /dev might also
> need to be a real union filesystem too, not just a union mount, since it
> too contains sub-directories, i.e. just in case anything needs to be
> created within one of them)
> 
> But I then find some odd error messages from init on the console:
> 
>    init: can't add utmpx record for `system boot': Bad file descriptor
>    init: can't add utmpx record for `runlevel': Bad file descriptor
>    init: can't add utmpx record for `console': Bad file descriptor
> 
> Note you can ignore the EBADF -- that's from an over-written errno (as
> far as I can see).  The underlying errno is (as would be expected) EROFS
> (and this is confirmed with ktrace on related tools accessing the same
> file, e.g. who, which BTW, when run as root will write to an empty utmpx
> file!).  Init reports these errors from calls to pututxline(), which
> calls getutxent(), and that tries to open the utmpx file with
> fopen("re+") (and then it tries "we+", before giving up and using "re").
> Under the hood the first two translate, of course, into
> open(O_RDWR|otherbits) (as '+' always upgrades the open mode to O_RDWR).
> 
> So, this is surprising!  (see below for why, if it's not already obvious)
> 
> The shell aso can't write to, or truncate, files from the underlying
> filesystem either:
> 
>    # echo -n >> /var/run/utmpx
>    sh: cannot create /var/run/utmpx: read-only file system
>    # echo -n > /var/run/utmpx
>    sh: cannot create /var/run/utmpx: read-only file system
> 
> Also very surprising, at least to me!
> 
> Curiously touch(1) will cause the file to be mirrored in the upper
> layer, yet it only calls utimensat(2) (not open())!
> 
> Now once the file is mirrored in the upper layer then init, who,
> etc. (and shell redirection to the utmpx file) all open the new writable
> union copy and work without complaint.
> 
> Note that new unique files (without any underlying read-only original)
> are, as expected, created without any problem (which is why the example
> of /var/obj from mount_union(8) works as advertised).
> 
> So I think this is surprising because mount_union(8) says:
> 
>     Requests to create or modify objects in uniondir are passed to the upper
>     layer with the exception of a few special cases.  An attempt to open for
>     writing a file which exists in the lower layer causes a copy of the
>     entire file to be made to the upper layer, and then for the upper layer
>     copy to be opened.  Similarly, an attempt to truncate a lower layer file
>     to zero length causes an empty file to be created in the upper layer.
>     Any other operation which would ultimately require modification to the
>     lower layer fails with EROFS.
> 
> To me it seems as if the claimed behaviour of "an attempt to open a for
> writing a file which exists in the lower layer" is failing!  (assuming
> O_RDWR is such an attempt) and also given what the shell reports in my
> example above it seems "an attempt to truncate a lower layer file" is
> also failing.  Furthermore although I would actually argue that
> utimensat() is also "an attempt to open for writing" (in the moral
> sense, especially since the original idea of touch(1) was to read a byte
> and write it back), the implementation is, strictly speaking, not an
> open() at all and perhaps should really be in the category of "any other
> operation" and thus actually be failing!
> 
> So are these things I find surprising actually bugs, or am I confused by
> what mount_union(8) is vaguely saying?
> 
> For the record this is with a (slightly dated) 9.99.81 kernel on amd64.

This behaviour comes from sys/fs/union/union_vnops.c::union_access():

 /*
  * Check access permission on the union vnode.
  * The access check being enforced is to check
  * against both the underlying vnode, and any
  * copied vnode.  This ensures that no additional
  * file permissions are given away simply because
  * the user caused an implicit file copy.
  */
 int
 union_access(void *v)

As the underlying node is on a ro-mounted file system this will
always fail with EROFS if the underlying node exists.

Maybe we should check for read access on the underlying file system,
copy up and then check the upper node.

--
J. Hannken-Illjes - hannken%mailbox.org@localhost

Attachment: signature.asc
Description: Message signed with OpenPGP



Home | Main Index | Thread Index | Old Index