Re: Automated report: NetBSD-current/i386 test failure

To: current-users%NetBSD.org@localhost
Subject: Re: Automated report: NetBSD-current/i386 test failure
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Sat, 27 May 2023 04:05:58 +0700

    Date:        Fri, 26 May 2023 02:29:05 +0000 (UTC)
    From:        NetBSD Test Fixture <bracket%NetBSD.org@localhost>
    Message-ID:  <168506814541.4954.15559262820055138705%babylon5.netbsd.org@localhost>

  | The newly failing test cases are:
  |
  |     fs/nfs/t_rquotad:get_nfs_be_1_both
  |     fs/nfs/t_rquotad:get_nfs_be_1_group
  |     fs/nfs/t_rquotad:get_nfs_be_1_user
  |     fs/nfs/t_rquotad:get_nfs_le_1_both
  |     fs/nfs/t_rquotad:get_nfs_le_1_group
  |     fs/nfs/t_rquotad:get_nfs_le_1_user
  |     lib/librumphijack/t_tcpip:nfs
  |     lib/librumphijack/t_vfs:cpcopy
  |     lib/librumphijack/t_vfs:ln_nox
  |     lib/librumphijack/t_vfs:mv_nox
  |     lib/librumphijack/t_vfs:paxcopy

All those failures are caused by the same thing (when one is fixed,
all will be, effectively)

This change:

  |     2023.05.25.17.37.05 kre src/sbin/mount/pathadj.c,v 1.4

made it happen.   That's the one that causes the mount_xxx programs
to exit(1) if realpath(3) fails on one of the pathnames - that of
the block device (or equivalent), or that of the mount point.  Some
of the mount_xxx programs only have the mount point, that distinction
isn't important here.

What these programs do, is

	mount_ffs /dk /somewhere

The /somewhere isn't the issue, it is /dk

/dk (in this rump environment has been set up via the -d
arg to rump_server which sets up a drivespec - a pseudo kind
of device that maps rump kernel space into real filesystem
space.   Of itself that would be OK, but all this uses rump's
magic etfs filesystem, which allows paths to be resolved,
a reference to /dk will work (in rump) but the file doesn't
actually exist in the root directory (in this instance) and
if that directory is read, /dk will not be found.

This is a problem for realpath() which is entrusted with checking
the path passed in, to make sure it is in canonical form, which
the mount_xxx programs rely upon, so that users can use any
workable name for what they want to mount, but the mounts will
always mount the same thing, using the canonical paths.

Since /dk doesn't really exist, even though it can be resolved,
realpath(3) has a problem, and fails.

That always happened - but until recently, that failure was non
fatal, and the mount programs simply ignored it.  In general that's
not safe, as was pointed out in a recent thread (or two) bad
things can happen.

realpath(3) is defined as char *realpath(char *path, char *canon)
(typically using different names for the args, but for here, those
work well.)   That takes path, and fills in the buffer passed in
via canon with the canonical form of path.   It then returns canon.
If it fails, it returns NULL - in the POSIX spec, what is in canon
then is undefined (applications must not use it).   But in NetBSD
we define it, it is the path that failed (more or less a prefix of
what would be the canonical path, if only there wasn't an error).

In our case, that will be /dk - as that's the path that realpath()
cannot find to verify.   Until this past week, pathadj() in the mount_xxx
program would have written an error (warning) to stderr - which the
tests carefully simply ignore, and them the mount_xxx program would
have simply assumed that the path filled in was complete, and good,
and gone ahead and used it.  Since that's what the test wants to happen,
and it actually works, as when the mount() sys call looks up /dk the
etfs magic makes that work, and the lookup (and mount) succeeds.

Now when pathadj() sees realpath fail, and since it is not permitted to
fail, it simply exits (exit(1)) and that causes the test to fail.

Right now, I'm not sure what to do about this - magic paths which
resolve, but don't appear in the filesystem, aren't something that
anyone needs to deal with.

Reverting the pathadj() change, just so these tests go back to succeeding
is the wrong thing to do - real uses of mount_xxx not involving rump,
should get the protection this provides.

For now these tests are just going to have to keep failing - if anyone is
concerned about that, by all means either disable them, or make them
skip, to avoid counting as failures.

I'll keep looking, and see if there is a reasonable path forward, which
allows rump's bizarre etfs to function as needed, without completely
breaking normal unix pathname resolution semantics.

If there doesn't seem to be any good solution, then some kind of rump
specific hack in pathadj() might be required, to make things keep running
the old way when doing this kind of mount.   That's truly ugly, 
but doable -- I am hoping for a better solution than that, but don't
see it yet.

Note that this is 100% rump self-inflicted pain - this issue will not,
in any way, affect the way any real system behaves.

Note also that we can't just change the test to ignore this error
value from exit(), as when that happens, the mount is no longer being
attempted - it simply isn't done at all.   That would be fatal for
the tests which are counting upon the mount succeeding.

kre

Follow-Ups:
- Re: Automated report: NetBSD-current/i386 test failure
  - From: Robert Elz
- Re: Automated report: NetBSD-current/i386 test failure
  - From: Christos Zoulas

References:
- Automated report: NetBSD-current/i386 test failure
  - From: NetBSD Test Fixture

Prev by Date: Re: 5G wifi connection
Next by Date: daily CVS update output
Previous by Thread: Automated report: NetBSD-current/i386 test failure
Next by Thread: Re: Automated report: NetBSD-current/i386 test failure
Indexes:

Home | Main Index | Thread Index | Old Index