Subject: Device minor numbers conversion in COMPAT_NETBSD32
To: None <tech-kern@netbsd.org>
From: Quentin Garnier <cube@cubidou.net>
List: tech-kern
Date: 01/01/2006 02:32:00
--ROA1rv1+fHr2QGor
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi folks,

A very unfortunate overlook (I can't blame anyone, though, it just had
to happen considering the hack that made it happen) introduced what is
now a rather hairy situation in amd64's COMPAT_NETBSD32.

It seems rather small, but I think it is hard to work around.  What
happens is that i386 has a very hacky way of defining minor device
numbers for disk devices.  In order to remain compatible with /dev
entries from the time i386's MAXPARTITIONS was 8, the 20th bit of the
minor number is used to code partitions 9-16 for each disk device.

I.e., on i386:

% ls -l /emul/netbsd32/dev/wd1[gi]
brw-r----- 0, 14     /emul/netbsd32/dev/wd1g
brw-r----- 0, 524296 /emul/netbsd32/dev/wd1i

Whereas, on amd64:

% ls -l /dev/wd1[gi]
brw-r----- 0, 22 /dev/wd1g
brw-r----- 0, 24 /dev/wd1i

I noticed that trying to boot an amd64 kernel over an i386 partition
which was on wd1g.  Booting an installed i386 partition with an amd64
kernel is something I'd really like to achieve, but this issue actually
makes it difficult.

So my first question is, do we want to allow this?  I.e., using a /dev
populated by an i386 MAKEDEV with an amd64 kernel.

If no, it settles the issue, but I already said what my position is.

If yes, the question is about where to do the conversion?  I've thought
a bit about that and it's more complex than it seems if a file
descriptor is passed across emulation boundaries.

First, I suggest adding a field to struct emul that points to a
conversion function.  Easy enough.  But it still leaves aside the
question of when doing the conversion.

If the conversion is done at each syscall, it will cause troubles when
the file descriptor is passed from a native process to a netbsd32
process, or the other way around:  the second process will try using a
different device, which wasn't opened.

An other solution is to tag the vnode with the real device number at
the time it was open.  The emulation (native or netbsd32) of the process
opening the vnode will decide what is the actual device.  This will
cause is troubles when the device is opened several times concurrently.

At this time I tend to favour the last solution, as it introduces less
tests in specfs code (and the few other places that uses the devsw
structs, like in uvm_vnode.c), which means it's less intrusive.  Also,
very few devices will need that hack, and those that are relevant are
not likely to be shared between native and netbsd32 processes.

Of course, there's still the solution of changing the way minor numbers
are allocated in either arch.  But it would get us a lot of angry
users...

Thoughts?

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"When I find the controls, I'll go where I like, I'll know where I want
to be, but maybe for now I'll stay right here on a silent sea."
KT Tunstall, Silent Sea, Eye to the Telescope, 2004.

--ROA1rv1+fHr2QGor
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iQEVAwUBQ7cxENgoQloHrPnoAQL4XggApL2C7987YonTN/1HWbCj9kbGpG9mYSYh
zN+jTbzZK6fLe+3HNnGx+TvpNnAa5HJQ+DekA7V+ZW1TSfRhkImxtc+GugenuFiW
mloAYlznlh79KajDbcWYSK12mU6SDVmy+0pQ7lvQdteCTLB6D2A0zioWQIpOWb/r
q0D0O+lxKGkz4MYRt7fYOW1Yiism5RSO/66poXDEhX9w9dUXOvcxT4ptjTZIgLBJ
FEQWDi33bUUL2ms113J356PXGB9hMUwQ/1hTQas05G0FoSQWqdaNfOJIMRtmOADD
sYG7JWKTbFBvq8wcuiSPdm+vv3ZoNATa8JJj65HncaHKUeUErvmjMQ==
=uK99
-----END PGP SIGNATURE-----

--ROA1rv1+fHr2QGor--