Subject: Re: Device minor numbers conversion in COMPAT_NETBSD32
To: Bill Studenmund <wrstuden@netbsd.org>
From: Quentin Garnier <cube@cubidou.net>
List: tech-kern
Date: 01/03/2006 09:06:42
--NV8Q+b3U03j8aVmL
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jan 02, 2006 at 06:55:30PM -0800, Bill Studenmund wrote:
> On Sun, Jan 01, 2006 at 02:32:00AM +0100, Quentin Garnier wrote:
[...]
> > I noticed that trying to boot an amd64 kernel over an i386 partition
> > which was on wd1g.  Booting an installed i386 partition with an amd64
> > kernel is something I'd really like to achieve, but this issue actually
> > makes it difficult.
> >=20
> > So my first question is, do we want to allow this?  I.e., using a /dev
> > populated by an i386 MAKEDEV with an amd64 kernel.
>=20
> Unfortunately we can't do that. The /def formats aren't compatible.

I'd *really* like to find a way :)

[...]
> > First, I suggest adding a field to struct emul that points to a
> > conversion function.  Easy enough.  But it still leaves aside the
> > question of when doing the conversion.
> >=20
> > If the conversion is done at each syscall, it will cause troubles when
> > the file descriptor is passed from a native process to a netbsd32
> > process, or the other way around:  the second process will try using a
> > different device, which wasn't opened.
> >=20
> > An other solution is to tag the vnode with the real device number at
> > the time it was open.  The emulation (native or netbsd32) of the process
> > opening the vnode will decide what is the actual device.  This will
> > cause is troubles when the device is opened several times concurrently.
>=20
> We already effectively do this; we resolve the major & minor at open.

No.  Calls to {c,b}devsw_lookup are done at each call (for a good reason
anyway;  a LKM can disappear).

> > At this time I tend to favour the last solution, as it introduces less
> > tests in specfs code (and the few other places that uses the devsw
> > structs, like in uvm_vnode.c), which means it's less intrusive.  Also,
> > very few devices will need that hack, and those that are relevant are
> > not likely to be shared between native and netbsd32 processes.
> >=20
> > Of course, there's still the solution of changing the way minor numbers
> > are allocated in either arch.  But it would get us a lot of angry
> > users...
>=20
> Unfortunately it's too late.

That's not a positive way of considering the issue.

> The problem with what you propose is that you're assuming that amd64=20
> binaries will only see amd64 /dev, and i386 binaries will only see=20
> i386 /dev. Thus we can convert based on emulation. However if we only hav=
e=20
> one /dev, which is what we would need to do to boot an amd64 kernel over=
=20
> an i386 partition, we lose. The problem is that we need to know what kind=
=20
> of /dev we have, not what kind of binary is opening it. :-(

There are two main cases to consider:

1.  User wants to boot an i386 system with an amd64 kernel (and please
    note that with the VM_TOPDOWN patch I sent on the port-amd64 list
    yesterday, we gain 1 GB of virtual space by using COMPAT_NETBSD32
    instead of NetBSD/i386, so it can matter in some situations).

2.  User wants to use a limited number of i386 binaries in an amd64
    environment.

Case 1 means we have to make the assumption COMPAT_NETBSD32 binaries
will only see an i386 /dev (which will be the only one in the system).

That means that in order for case 2 to still work, we'll require that
one step of the COMPAT_NETBSD32 setup is to populate /emul/netbsd32/dev
with i386's MAKEDEV.  Through the compat rewrite of path, i386 binaries
will get the correct path and access the correct device.

In any case, case 2 doesn't matter much for the devices we're
considering.  If the user wants to manipulate disk devices, he'll do
that with the native tools instead of the i386 ones.  And besides, the
vast majority of devices don't need minor number conversion.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"When I find the controls, I'll go where I like, I'll know where I want
to be, but maybe for now I'll stay right here on a silent sea."
KT Tunstall, Silent Sea, Eye to the Telescope, 2004.

--NV8Q+b3U03j8aVmL
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iQEVAwUBQ7owktgoQloHrPnoAQJCUAgAoqPjTXZsnfbA/1eZuDc77ljQgcLT02zC
HR4qSvYbB2d4FBYnw/D2BMVEnUs8vcdt7U2OFM5SKiF8c91FVpSyaAgY32+ritfk
zARuNbkuNGmvuhv5FmpZDzp57KqwJ32EECnxb3SNLMHOithjvZ7lt6lLZS8v1nih
OyRK1tV8BGqUSjKNKUdUrMZpBkYlMmKxqShlln7Kxkq3EYkO9v9FGx3A1rly6W+w
3Oj7/heymkp46bD5tRrdFF7FnriGNrnPKagNa83deld84+w9Yb657MwmkgoNFVZq
D8pHquUnKvPqg/x2h0rFlvSX995wyh/9xwDBhiVija8mlNZajoScYQ==
=aWY4
-----END PGP SIGNATURE-----

--NV8Q+b3U03j8aVmL--