tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

re: COMPAT_50 vs NET_RT_IFLIST



Just a reminder, there are actually TWO issues where compat_50 is
broken.  Both breakages occur between the 7.0 and 8.0 releases, but
at different times and for different reasons.

First, the sysctl(8) stuff used by getifaddrs(2) is broken.  To
test is simple:

	1. Build a release, and install it (qemu VM is fine)
	2. Create a /chroot52 directory, and unpack the base.tgz
	   from NetBSD-5.2
	3. Boot the result, login as root, and execute the command

		# chroot /chroot52 ifconfig -l

	4. A working system will display lo0 (and for qemu, wm0)
	   while a broken system displays a blank line.

As indicated earlier, this problem was introduced between 2019-09-21 at 10:00:00 UTC (working) and 2019-09-21 at 19:18:10 UTC (broken). And we
cannot get much more specific because there was some build breakage
during this 7-hour interval.

The second breakage involves the routing socket itself, and can be
reproduced with the following steps:

	1. Build a release, and install it (qemu VM is fine)
	2. Create a /chroot52 directory, and unpack the base.tgz
	   from NetBSD-5.2
	3. Boot the result, login as root, and execute the commands

		# chroot /chroot52 route monitor &
		# ifconfig lo0 alias 1.2.3.4

	4. A working system will display a couple of routine
	   table update messages such as

		RTM_ONEWADDR
		got message of size 152 on Wed May  1 04:43:57 2019
		RTM_ADD: Add Route: len 152, pid 463, seq 0, errno 0, flags: <UP,HOST> locks:  inits:
		sockaddrs: <DST,GATEWAY>
		 1.2.3.4 lo0

	   while a broken system displays nothing.

This second breakage was introduced between 2017-04-11 at 13:50 UTC (working) and 2017-04-11 at 14:00 UTC (broken). During that interval
there was only one commit:

	Module Name:    src
	Committed By:   roy
	Date:           Tue Apr 11 13:55:55 UTC 2017

	Modified Files:
		src/share/man/man4: route.4
		src/sys/net: raw_cb.h raw_usrreq.c route.h rtsock.c

	Log Message:
	Add RO_MSGFILTER socket option to PF_ROUTE to filter out
	un-wanted route(4) messages.

	Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
	but with an API which allows the full range of potential
	message types.




The failures (``ifconfig -l'' and ``route monitor'') do NOT occur when
running on a 7.0 base system.

So it would seem that the problem is specifically with the compat_50
code, and was introduced between 7.0 and 8.0.

OK, so armed with these two data points (7.0 ==> GOOD, 8.0 ==> BAD) I
was able to run a bisect to identify the culprit.

Sources from 2019-09-21 at 10:00:00 UTC ==> GOOD
Sources from 2019-09-21 at 19:18:10 UTC ==> BAD


There are several commits during this time window, but the build was
broken for various reasons for several hours (as shown by the babylon5
test logs).  The only commits that seem relevant are those which start
with the following:

	Module Name:    src
	Committed By:   roy
	Date:           Wed Sep 21 10:50:23 UTC 2016

	Modified Files:
	        src/share/man/man4: route.4
	        src/sys/compat/common: Makefile
	        src/sys/compat/net: if.h route.h
	        src/sys/net: if.h route.h rtsock.c
	        src/sys/rump/net/lib/libnet: Makefile
	        src/sys/sys: socket.h
	Added Files:
	        src/sys/compat/common: rtsock_70.c

	Log Message:
	Add ifam_pid and ifam_addrflags to ifa_msghdr.
	Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and
	NET_RT_IFLIST.  Add compat code for old version.

Roy, can you please look into this further?  Thanks!

Note that the breakage for the 5.2 version of ``ifconfig -l'' began
with this commit, yet the 5.2 version of ``route monitor'' continues
to produce "reasonable" looking results.

	# chroot /chroot52 route monitor &
	# ifconfig lo0 alias 1.2.3.4
	RTM_ONEWADDR
	got message of size 152 on Wed May  1 04:43:57 2019
RTM_ADD: Add Route: len 152, pid 463, seq 0, errno 0, flags: <UP,HOST> locks: inits:
	sockaddrs: <DST,GATEWAY>
	 1.2.3.4 lo0

The ``route monitor'' starts failing to function correctly at some time
after 2016-09-21 19:18:10 UTC (it definitely fails as of 2017-05-27
00:00 UTC).


Hopefully this narrows things enough for someone familiar with the
rtsock stuff to help us make some forward progress.



On Tue, 30 Apr 2019, Paul Goyette wrote:

Some additional testing (on a -current base system) shows that the
problem is almost certainly related to compat_50 code.  Using the
ifconfig from 6.0 or newer does not display the problem.

Also, the issue is probably wider than just the sysctl stuff, since
running a 5.2 version of ``route monitor'' produces no output when
adding or changing an addresss on lo0;  the 6.0 version of route
monitor produces correct output.

Furthermore, previous testing show that the problem also occurs on
a 8.0 base system with 5.2 userland.  (I have not tested a 7.0 base
system.)




On Mon, 29 Apr 2019, Paul Goyette wrote:

Alas, making the suggested changes does not help.  Same results as
before:

Userland and Kernel both -current with suggested changes (the diffs
are attached to this Email):

	# ifconfig -l
	wm0 lo0
	# ifconfig lo0
	lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
	        inet 127.0.0.1/8 flags 0x0
	        inet6 ::1/128 flags 0x20<NODAD>
	        inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x2
	#


And with a 5.2 base system loaded in /chroot52 directory:

	# chroot /chroot52 ifconfig -l

	# chroot /chroot52 ifconfig lo0

	#


On Mon, 29 Apr 2019, matthew green wrote:

I still cannot explain how things got broken between 5.2 and 8.0.  I
will defer to those who are more expert in this area than am I.  My
suspicion is that the breakage is related to sys/socket.h rev 1.99
which versioned AF_{,O}ROUTE for some 64-bit cleanliness.

i think i have a guess about the problem.

sys/net/if.h, sys/net/route.h, and sys/compat/net/if.h all
have this code:

/*
* Message format for use in obtaining information about interfaces from
* sysctl and the routing socket. We need to force 64-bit alignment if we
* aren't using compatiblity definitons.
*/
#if !defined(_KERNEL) || !defined(COMPAT_RTSOCK)
#define __align64       __aligned(sizeof(uint64_t))
#else
#define __align64
#endif
struct if_msghdr {
       u_short ifm_msglen __align64;

but i think this comment is wrong.

the compat structures are defined in the compat headers and
the above structure should never change, however when the
code handling code wants to talk to the *real* structure it
will get this adjusted one (without the align), and thus
it will copy the wrong portions out from it.

the fix may be as simple as removing this from these headers
(leaving it always defined for the current defs), and making
sure that the compat headers have the right alignment (my
quick look seem ok.)

this will, obviously, need a recompile of the newer kernel.


.mrg.





+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul%whooppee.com@localhost     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette%netbsd.org@localhost   |
+--------------------+--------------------------+-----------------------+




+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul%whooppee.com@localhost     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette%netbsd.org@localhost   |
+--------------------+--------------------------+-----------------------+





+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul%whooppee.com@localhost     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette%netbsd.org@localhost   |
+--------------------+--------------------------+-----------------------+

!DSPAM:5cc78918185496256522020!



+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul%whooppee.com@localhost     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette%netbsd.org@localhost   |
+--------------------+--------------------------+-----------------------+


+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul%whooppee.com@localhost     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette%netbsd.org@localhost   |
+--------------------+--------------------------+-----------------------+


Home | Main Index | Thread Index | Old Index