NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 != 3)
The following reply was made to PR bin/56550; it has been noted by GNATS.
From: Michael Cheponis <michael.cheponis%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, mac%culver.net@localhost
Subject: Re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 != 3)
Date: Mon, 27 Dec 2021 13:19:35 -0800
--0000000000001cb4af05d42745d5
Content-Type: text/plain; charset="UTF-8"
Hi Matthew,
I will do both of your suggestions (custom kernel config, patch
uvm_swap_stats() ).
However, I noticed something unusual yesterday with sysctl: I have the
line "sysctl machdep.cpu.frequency.current" in my .zshrc file so I should
get my cpu's current frequency, and that works the 1st time I log in.
However, often, when I start a 2nd (or 3rd or...) ssh session, the machine
hangs that login at that line; ^T shows this:
*< many same lines before with ^T, just diff systime >*
*[ 108947.5647601] load: 1.07 cmd: sysctl 9875 [tstile] 0.00u 0.51s 0%
1308k*
*[ 108947.5977861] load: 1.07 cmd: sysctl 9875 [tstile] 0.00u 0.51s 0%
1308k*
*^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C*
*^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z*
*[ 108955.9438290] load: 1.06 cmd: sysctl 9875 [tstile] 0.00u 0.51s 0%
1308k*
*< many same lines after with ^T, just diff systime >*
sysctl sits there forever in this 'tstile' state.
Note that ^C and ^Z seem to only echo, and not get caught. (I did not
report this as I cannot (yet) reliably reproduce this after I reboot and
try again; but has happened about 3x so far, reboot fixes it)
I'm now wondering if this isn't some specific Raspberry Pi 4A hardware
problem -- because I, too, tried continuous "swapctl -l" loops with no
failures.
I'm wondering if there's something about the way the sysctl database allows
member access? I don't think the swapctl program is at fault -- I think
it is getting something bogus and simply reporting the bogosity.
Thanks for banging on this; these 'intermittents' are incredible pains. If
it's real, there is likely some 'race' somewhere. If it's HW, well...
sorry. But, gosh, everything else is rock-solid.
Let me collect more data; don't pull your hair out!
Thanks again,
Mike
On Sun, Dec 26, 2021 at 11:10 PM matthew green <mrg%eterna.com.au@localhost> wrote:
> The following reply was made to PR bin/56550; it has been noted by GNATS.
>
> From: matthew green <mrg%eterna.com.au@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, mac%culver.net@localhost
> Subject: re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 !=
> 3)
> Date: Mon, 27 Dec 2021 18:05:59 +1100
>
> i really don't see how this can happen without something
> doing swapctl(8) configuration in the middle. i let a
> system run swapctl -l a few 10,000s of times with one
> device and 2 files configured, no problem.
>
> the code literally just returns "uvmexp.nswapdev", and
> that only changes when swap devices are added or removed,
> so maybe something internal is weird (though the transient
> nature of this is very odd.)
>
> more things you could do: build a kernel with:
>
> options UVMHIST
>
> and when you observe the failure, run "vmstat -u pdhist",
> and see what these entries show:
>
> ---
> if (SCARG(uap, cmd) == SWAP_NSWAP) {
> const int nswapdev = uvmexp.nswapdev;
> UVMHIST_LOG(pdhist, "<- done SWAP_NSWAP=%jd", nswapdev,
> 0, 0, 0);
> *retval = nswapdev;
> return 0;
> ---
>
> you could also patch uvm_swap_stats() to print something if
> it gets to thsi line:
>
> *retval = count;
>
> right before the return, and check that "count == misc",
> printing them both if mismatched. if this happens, the
> next step would be to look deeper into double-loop in this
> function and see what it is doing wrong.
>
>
> .mrg.
>
>
--0000000000001cb4af05d42745d5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:arial,he=
lvetica,sans-serif;font-size:small">Hi Matthew,</div><div class=3D"gmail_de=
fault" style=3D"font-family:arial,helvetica,sans-serif;font-size:small"><br=
></div><div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sa=
ns-serif;font-size:small">I will do both of your suggestions (custom kernel=
config,=C2=A0<span style=3D"font-family:Arial,Helvetica,sans-serif">patch =
uvm_swap_stats() ).</span></div><div class=3D"gmail_default" style=3D"font-=
family:arial,helvetica,sans-serif;font-size:small"><span style=3D"font-fami=
ly:Arial,Helvetica,sans-serif"><br></span></div><div class=3D"gmail_default=
" style=3D"font-family:arial,helvetica,sans-serif;font-size:small"><span st=
yle=3D"font-family:Arial,Helvetica,sans-serif">However, I noticed something=
=C2=A0unusual yesterday with sysctl:=C2=A0 I have the line "sysctl=C2=
=A0</span><span style=3D"font-family:Arial,Helvetica,sans-serif">machdep.cp=
u.frequency.current" in my .zshrc file so I should get my cpu's cu=
rrent frequency, and that works the 1st time I log in.=C2=A0 However, often=
, when I start a 2nd (or 3rd or...) ssh session, the machine hangs that log=
in at that line; ^T shows this:</span></div><div class=3D"gmail_default" st=
yle=3D"font-family:arial,helvetica,sans-serif;font-size:small"><span style=
=3D"font-family:Arial,Helvetica,sans-serif"><br></span></div><blockquote st=
yle=3D"margin:0 0 0 40px;border:none;padding:0px"><div class=3D"gmail_defau=
lt" style=3D""><font face=3D"monospace" style=3D"" size=3D"1"><b>< many =
same lines before with ^T, just diff systime ></b></font></div><div clas=
s=3D"gmail_default" style=3D""><font face=3D"monospace" size=3D"1"><b><br><=
/b></font></div><div class=3D"gmail_default" style=3D""><font face=3D"monos=
pace" size=3D"1"><b>[ 108947.5647601] load: 1.07 =C2=A0cmd: sysctl 9875 [ts=
tile] 0.00u 0.51s 0% 1308k</b></font></div><div class=3D"gmail_default" sty=
le=3D""><font face=3D"monospace" size=3D"1"><b>[ 108947.5977861] load: 1.07=
=C2=A0cmd: sysctl 9875 [tstile] 0.00u 0.51s 0% 1308k</b></font></div><div =
class=3D"gmail_default" style=3D""><font face=3D"monospace" size=3D"1"><b>^=
C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C=
^C^C^C^C^C^C^C^C^C^C^C^C^C</b></font></div><div class=3D"gmail_default" sty=
le=3D""><font face=3D"monospace" size=3D"1"><b>^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z=
^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z</b></font></=
div><div class=3D"gmail_default" style=3D""><font face=3D"monospace" size=
=3D"1"><b>[ 108955.9438290] load: 1.06 =C2=A0cmd: sysctl 9875 [tstile] 0.00=
u 0.51s 0% 1308k</b></font></div><div class=3D"gmail_default" style=3D""><d=
iv class=3D"gmail_default"><font face=3D"monospace" size=3D"1"><b><br></b><=
/font></div></div><div class=3D"gmail_default" style=3D""><div class=3D"gma=
il_default" style=3D""><font face=3D"monospace" style=3D"" size=3D"1"><b>&l=
t; many same lines after with ^T, just diff systime ></b></font></div></=
div></blockquote><div class=3D"gmail_default" style=3D"font-family:arial,he=
lvetica,sans-serif;font-size:small"><br class=3D"gmail-Apple-interchange-ne=
wline"></div><div class=3D"gmail_default" style=3D"font-family:arial,helvet=
ica,sans-serif;font-size:small">sysctl sits there forever in this 'tsti=
le' state.</div><div class=3D"gmail_default" style=3D"font-family:arial=
,helvetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_defaul=
t" style=3D"font-family:arial,helvetica,sans-serif;font-size:small">Note th=
at ^C and ^Z seem to only echo, and not get caught.=C2=A0 =C2=A0(I did not =
report this as I cannot (yet) reliably reproduce this after I reboot and tr=
y again; but has happened about 3x so far, reboot fixes it)</div><div class=
=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;font-siz=
e:small"><br></div><div class=3D"gmail_default" style=3D"font-family:arial,=
helvetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_default=
" style=3D"font-family:arial,helvetica,sans-serif;font-size:small">I'm =
now wondering if this isn't some specific Raspberry Pi 4A hardware prob=
lem -- because I, too, tried continuous=C2=A0"swapctl -l" loops w=
ith no failures.</div><div class=3D"gmail_default" style=3D"font-family:ari=
al,helvetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_defa=
ult" style=3D"font-family:arial,helvetica,sans-serif;font-size:small">I'=
;m wondering if there's something about the way the sysctl database all=
ows member access?=C2=A0 =C2=A0I don't think the swapctl program is at =
fault -- I think it is getting something bogus and simply reporting the bog=
osity.</div><div class=3D"gmail_default" style=3D"font-family:arial,helveti=
ca,sans-serif;font-size:small"><br></div><div class=3D"gmail_default" style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div =
class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;fon=
t-size:small">Thanks for banging on this; these 'intermittents' are=
incredible pains.=C2=A0 If it's real, there is likely some 'race&#=
39; somewhere.=C2=A0 =C2=A0If it's HW, well... sorry.=C2=A0 But, gosh, =
everything else is rock-solid.</div><div class=3D"gmail_default" style=3D"f=
ont-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class=
=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;font-siz=
e:small">Let me collect more data; don't pull your hair out!</div><div =
class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;fon=
t-size:small"><br></div><div class=3D"gmail_default" style=3D"font-family:a=
rial,helvetica,sans-serif;font-size:small">Thanks again,</div><div class=3D=
"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;font-size:s=
mall">Mike</div><div class=3D"gmail_default" style=3D"font-family:arial,hel=
vetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_default" s=
tyle=3D"font-family:arial,helvetica,sans-serif;font-size:small"><br></div><=
/div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">O=
n Sun, Dec 26, 2021 at 11:10 PM matthew green <<a href=3D"mailto:mrg@ete=
rna.com.au">mrg%eterna.com.au@localhost</a>> wrote:<br></div><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20=
4,204,204);padding-left:1ex">The following reply was made to PR bin/56550; =
it has been noted by GNATS.<br>
<br>
From: matthew green <<a href=3D"mailto:mrg%eterna.com.au@localhost" target=3D"_bla=
nk">mrg%eterna.com.au@localhost</a>><br>
To: <a href=3D"mailto:gnats-bugs%netbsd.org@localhost" target=3D"_blank">gnats-bugs@n=
etbsd.org</a><br>
Cc: <a href=3D"mailto:gnats-admin%netbsd.org@localhost" target=3D"_blank">gnats-admin=
@netbsd.org</a>, <a href=3D"mailto:netbsd-bugs%netbsd.org@localhost" target=3D"_blank=
">netbsd-bugs%netbsd.org@localhost</a>, <a href=3D"mailto:mac%culver.net@localhost" target=3D"_=
blank">mac%culver.net@localhost</a><br>
Subject: re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 !=
=3D 3)<br>
Date: Mon, 27 Dec 2021 18:05:59 +1100<br>
<br>
=C2=A0i really don't see how this can happen without something<br>
=C2=A0doing swapctl(8) configuration in the middle.=C2=A0 i let a<br>
=C2=A0system run swapctl -l a few 10,000s of times with one<br>
=C2=A0device and 2 files configured, no problem.<br>
<br>
=C2=A0the code literally just returns "uvmexp.nswapdev", and<br>
=C2=A0that only changes when swap devices are added or removed,<br>
=C2=A0so maybe something internal is weird (though the transient<br>
=C2=A0nature of this is very odd.)<br>
<br>
=C2=A0more things you could do:=C2=A0 build a kernel with:<br>
<br>
=C2=A0 =C2=A0 options UVMHIST<br>
<br>
=C2=A0and when you observe the failure, run "vmstat -u pdhist",<b=
r>
=C2=A0and see what these entries show:<br>
<br>
=C2=A0---<br>
=C2=A0if (SCARG(uap, cmd) =3D=3D SWAP_NSWAP) {<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const int nswapdev =3D uvmexp.nswapdev;<b=
r>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0UVMHIST_LOG(pdhist, "<- done SWAP=
_NSWAP=3D%jd", nswapdev,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00, 0, 0);<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*retval =3D nswapdev;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;<br>
=C2=A0---<br>
<br>
=C2=A0you could also patch uvm_swap_stats() to print something if<br>
=C2=A0it gets to thsi line:<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*retval =3D count;<br>
<br>
=C2=A0right before the return, and check that "count =3D=3D misc"=
,<br>
=C2=A0printing them both if mismatched.=C2=A0 if this happens, the<br>
=C2=A0next step would be to look deeper into double-loop in this<br>
=C2=A0function and see what it is doing wrong.<br>
<br>
<br>
=C2=A0.mrg.<br>
<br>
</blockquote></div>
--0000000000001cb4af05d42745d5--
Home |
Main Index |
Thread Index |
Old Index