NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/52643: npfctl reload on rump kernels fails randomly
On Mon, Oct 23, 2017 at 4:45 PM, <ozaki-r%netbsd.org@localhost> wrote:
>>Number: 52643
>>Category: kern
>>Synopsis: npfctl reload on rump kernels fails randomly
>>Confidential: no
>>Severity: serious
>>Priority: medium
>>Responsible: kern-bug-people
>>State: open
>>Class: sw-bug
>>Submitter-Id: net
>>Arrival-Date: Mon Oct 23 07:45:00 +0000 2017
>>Originator: Ryota Ozaki
>>Release: -current (and maybe earlier releases)
>>Organization:
>>Environment:
> NetBSD rangeley 8.99.3 NetBSD 8.99.3 (RANGELEY) #77: Wed Oct 4 10:23:38 JST 2017 ozaki-r@rangeley:(hidden) amd64
>>Description:
> npfctl reload fails sometimes on rump kernels with EIO like this:
>
> tc-so:Executing command [ env LD_PRELOAD=/usr/lib/librumphijack.so RUMPHIJACK=path=/rump,socket=all:nolocal,sysctl=yes,blanket=/dev/npf npfctl reload ./npf.conf ]
> tc-se:Fail: incorrect exit status: 1, expected: 0
> tc-se:stdout:
> tc-se:
> tc-se:stderr:
> tc-se:npfctl: source line 333501697
> tc-se:npfctl: object: 535
> tc-se:npfctl: npfctl_config_send: Input/output error
> tc-se:
>
> The error message is generated by libnpf that uses libprop.
>
> npfctl uses npf_config_submit of libnpf to call ioctl to the kernel.
> libnpf uses prop_dictionary_sendrecv_ioctl to call ioctl and
> get an errno of the ioctl by prop_dictionary_get_int32.
> On npfctl reload on rump kernels, prop_dictionary_sendrecv_ioctl succeeds
> but prop_dictionary_get_int32 fails because errno isn't set correctly.
>
> It seems that npfctl_load that is a kernel function of npf and called
> via the prop_dictionary_sendrecv_ioctl doesn't set the errno to the return
> value of libprop: https://nxr.netbsd.org/xref/src/sys/net/npf/npf_ctl.c#635
>
> /* Error report. */
> #if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
> prop_dictionary_set_int32(errdict, "errno", error);
> prop_dictionary_copyout_ioctl(pref, cmd, errdict);
> prop_object_release(errdict);
> error = 0;
> #endif
> return error;
>
> The code setting an errno is trimmed on rump kernels since by default
> rump kernels set _NPF_TESTING. If we enable the code, npfctl reload
> works expectedly.
>
> Note that npfctl reload sometimes succeeds, I guess, because npf_config_submit
> tries to read an errno from a uninitialized memory and it sometimes works luckily.
>>How-To-Repeat:
> # From an ATF test script that is under development
>
> rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpnet_shmif -lrumpdev -lrumpvfs -lrumpdev_bpf -lrumpnet_bpfjit -lrumpnet_npf -lrumpnet_netinet6 unix://npf_nat_local
>
> export RUMP_SERVER=unix://npf_nat_local
> rump.ifconfig shmif0 create
> rump.ifconfig shmif0 linkstr ./bus_npf_local
> rump.ifconfig shmif1 create
> rump.ifconfig shmif1 linkstr ./bus_npf_nat
>
> rump.sysctl -q -w net.inet.ip.dad_count=0
> rump.ifconfig shmif0 10.0.1.1/24
> rump.ifconfig shmif1 20.0.0.1/24
> rump.sysctl -q -w net.inet.ip.forwarding=1
> rump.route -n add -net 10.0.2.0 20.0.0.2
>
> cat > ./npf.conf <<-EOF
> set bpf.jit off
> \$int_if = inet4(shmif0)
> \$ext_if = inet4(shmif1)
> \$localnet = { 10.0.1.0/24 }
> map \$ext_if dynamic \$localnet -> \$ext_if
> group "external" on \$ext_if {
> pass stateful out final all
> }
> group "internal" on \$int_if {
> block in all
> pass in final from \$localnet
> pass out final all
> }
> group default {
> pass final on lo0 all
> block all
> }
> EOF
>
> env LD_PRELOAD=/usr/lib/librumphijack.so RUMPHIJACK=path=/rump,socket=all:nolocal,sysctl=yes,blanket=/dev/npf npfctl reload ./npf.conf
>>Fix:
> As described above, enabling the code fixes the issue, but I'm not sure
> if the fix is appropriate.
>
> diff --git a/sys/net/npf/npf_ctl.c b/sys/net/npf/npf_ctl.c
> index b1af2d347ac..43105db9c85 100644
> --- a/sys/net/npf/npf_ctl.c
> +++ b/sys/net/npf/npf_ctl.c
> @@ -633,12 +633,12 @@ fail:
> prop_object_release(npf_dict);
>
> /* Error report. */
> -#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
> +//#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
> prop_dictionary_set_int32(errdict, "errno", error);
> prop_dictionary_copyout_ioctl(pref, cmd, errdict);
> prop_object_release(errdict);
> error = 0;
> -#endif
> +//#endif
> return error;
> }
>
Further investigations found the following requirements:
- prop_object_release(errdict) must be always called
because errdict = prop_dictionary_create() is always called
- prop_object_release(npf_dict) needs to be called only if
npf_dict is allocated by prop_dictionary_copyin_ioctl_size
- _NPF_STANDALONE doesn't require to set prop
- For _NPF_TESTING, if npfctl_testing, setting prop isn't
needed, otherwise it's needed
So I think a fix looks like the following patch. Comments?
ozaki-r
--
diff --git a/sys/net/npf/npf_ctl.c b/sys/net/npf/npf_ctl.c
index b1af2d347ac..3e82db6bf73 100644
--- a/sys/net/npf/npf_ctl.c
+++ b/sys/net/npf/npf_ctl.c
@@ -630,15 +630,25 @@ fail:
if (tblset) {
npf_tableset_destroy(tblset);
}
- prop_object_release(npf_dict);
+#if defined(_NPF_TESTING) || defined(_NPF_STANDALONE)
+ if (!npfctl_testing)
+#endif
+ prop_object_release(npf_dict);
- /* Error report. */
-#if !defined(_NPF_TESTING) && !defined(_NPF_STANDALONE)
- prop_dictionary_set_int32(errdict, "errno", error);
- prop_dictionary_copyout_ioctl(pref, cmd, errdict);
+#ifndef _NPF_STANDALONE
+#ifdef _NPF_TESTING
+ if (!npfctl_testing) {
+#endif
+ /* Error report. */
+ prop_dictionary_set_int32(errdict, "errno", error);
+ prop_dictionary_copyout_ioctl(pref, cmd, errdict);
+ error = 0;
+#ifdef _NPF_TESTING
+ }
+#endif
+#endif /* _NPF_STANDALONE */
prop_object_release(errdict);
- error = 0;
-#endif
+
return error;
}
Home |
Main Index |
Thread Index |
Old Index