NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 != 3)



The following reply was made to PR bin/56550; it has been noted by GNATS.

From: Michael Cheponis <michael.cheponis%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, mac%culver.net@localhost
Subject: Re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 != 3)
Date: Mon, 27 Dec 2021 13:19:35 -0800

 --0000000000001cb4af05d42745d5
 Content-Type: text/plain; charset="UTF-8"
 
 Hi Matthew,
 
 I will do both of your suggestions (custom kernel config, patch
 uvm_swap_stats() ).
 
 However, I noticed something unusual yesterday with sysctl:  I have the
 line "sysctl machdep.cpu.frequency.current" in my .zshrc file so I should
 get my cpu's current frequency, and that works the 1st time I log in.
 However, often, when I start a 2nd (or 3rd or...) ssh session, the machine
 hangs that login at that line; ^T shows this:
 
 *< many same lines before with ^T, just diff systime >*
 
 *[ 108947.5647601] load: 1.07  cmd: sysctl 9875 [tstile] 0.00u 0.51s 0%
 1308k*
 *[ 108947.5977861] load: 1.07  cmd: sysctl 9875 [tstile] 0.00u 0.51s 0%
 1308k*
 *^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C*
 *^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z*
 *[ 108955.9438290] load: 1.06  cmd: sysctl 9875 [tstile] 0.00u 0.51s 0%
 1308k*
 
 *< many same lines after with ^T, just diff systime >*
 
 
 sysctl sits there forever in this 'tstile' state.
 
 Note that ^C and ^Z seem to only echo, and not get caught.   (I did not
 report this as I cannot (yet) reliably reproduce this after I reboot and
 try again; but has happened about 3x so far, reboot fixes it)
 
 
 I'm now wondering if this isn't some specific Raspberry Pi 4A hardware
 problem -- because I, too, tried continuous "swapctl -l" loops with no
 failures.
 
 I'm wondering if there's something about the way the sysctl database allows
 member access?   I don't think the swapctl program is at fault -- I think
 it is getting something bogus and simply reporting the bogosity.
 
 
 Thanks for banging on this; these 'intermittents' are incredible pains.  If
 it's real, there is likely some 'race' somewhere.   If it's HW, well...
 sorry.  But, gosh, everything else is rock-solid.
 
 Let me collect more data; don't pull your hair out!
 
 Thanks again,
 Mike
 
 
 
 On Sun, Dec 26, 2021 at 11:10 PM matthew green <mrg%eterna.com.au@localhost> wrote:
 
 > The following reply was made to PR bin/56550; it has been noted by GNATS.
 >
 > From: matthew green <mrg%eterna.com.au@localhost>
 > To: gnats-bugs%netbsd.org@localhost
 > Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, mac%culver.net@localhost
 > Subject: re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 !=
 > 3)
 > Date: Mon, 27 Dec 2021 18:05:59 +1100
 >
 >  i really don't see how this can happen without something
 >  doing swapctl(8) configuration in the middle.  i let a
 >  system run swapctl -l a few 10,000s of times with one
 >  device and 2 files configured, no problem.
 >
 >  the code literally just returns "uvmexp.nswapdev", and
 >  that only changes when swap devices are added or removed,
 >  so maybe something internal is weird (though the transient
 >  nature of this is very odd.)
 >
 >  more things you could do:  build a kernel with:
 >
 >     options UVMHIST
 >
 >  and when you observe the failure, run "vmstat -u pdhist",
 >  and see what these entries show:
 >
 >  ---
 >  if (SCARG(uap, cmd) == SWAP_NSWAP) {
 >          const int nswapdev = uvmexp.nswapdev;
 >          UVMHIST_LOG(pdhist, "<- done SWAP_NSWAP=%jd", nswapdev,
 >              0, 0, 0);
 >          *retval = nswapdev;
 >          return 0;
 >  ---
 >
 >  you could also patch uvm_swap_stats() to print something if
 >  it gets to thsi line:
 >
 >          *retval = count;
 >
 >  right before the return, and check that "count == misc",
 >  printing them both if mismatched.  if this happens, the
 >  next step would be to look deeper into double-loop in this
 >  function and see what it is doing wrong.
 >
 >
 >  .mrg.
 >
 >
 
 --0000000000001cb4af05d42745d5
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable
 
 <div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:arial,he=
 lvetica,sans-serif;font-size:small">Hi Matthew,</div><div class=3D"gmail_de=
 fault" style=3D"font-family:arial,helvetica,sans-serif;font-size:small"><br=
 ></div><div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sa=
 ns-serif;font-size:small">I will do both of your suggestions (custom kernel=
  config,=C2=A0<span style=3D"font-family:Arial,Helvetica,sans-serif">patch =
 uvm_swap_stats() ).</span></div><div class=3D"gmail_default" style=3D"font-=
 family:arial,helvetica,sans-serif;font-size:small"><span style=3D"font-fami=
 ly:Arial,Helvetica,sans-serif"><br></span></div><div class=3D"gmail_default=
 " style=3D"font-family:arial,helvetica,sans-serif;font-size:small"><span st=
 yle=3D"font-family:Arial,Helvetica,sans-serif">However, I noticed something=
 =C2=A0unusual yesterday with sysctl:=C2=A0 I have the line &quot;sysctl=C2=
 =A0</span><span style=3D"font-family:Arial,Helvetica,sans-serif">machdep.cp=
 u.frequency.current&quot; in my .zshrc file so I should get my cpu&#39;s cu=
 rrent frequency, and that works the 1st time I log in.=C2=A0 However, often=
 , when I start a 2nd (or 3rd or...) ssh session, the machine hangs that log=
 in at that line; ^T shows this:</span></div><div class=3D"gmail_default" st=
 yle=3D"font-family:arial,helvetica,sans-serif;font-size:small"><span style=
 =3D"font-family:Arial,Helvetica,sans-serif"><br></span></div><blockquote st=
 yle=3D"margin:0 0 0 40px;border:none;padding:0px"><div class=3D"gmail_defau=
 lt" style=3D""><font face=3D"monospace" style=3D"" size=3D"1"><b>&lt; many =
 same lines before with ^T, just diff systime &gt;</b></font></div><div clas=
 s=3D"gmail_default" style=3D""><font face=3D"monospace" size=3D"1"><b><br><=
 /b></font></div><div class=3D"gmail_default" style=3D""><font face=3D"monos=
 pace" size=3D"1"><b>[ 108947.5647601] load: 1.07 =C2=A0cmd: sysctl 9875 [ts=
 tile] 0.00u 0.51s 0% 1308k</b></font></div><div class=3D"gmail_default" sty=
 le=3D""><font face=3D"monospace" size=3D"1"><b>[ 108947.5977861] load: 1.07=
  =C2=A0cmd: sysctl 9875 [tstile] 0.00u 0.51s 0% 1308k</b></font></div><div =
 class=3D"gmail_default" style=3D""><font face=3D"monospace" size=3D"1"><b>^=
 C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C=
 ^C^C^C^C^C^C^C^C^C^C^C^C^C</b></font></div><div class=3D"gmail_default" sty=
 le=3D""><font face=3D"monospace" size=3D"1"><b>^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z=
 ^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z</b></font></=
 div><div class=3D"gmail_default" style=3D""><font face=3D"monospace" size=
 =3D"1"><b>[ 108955.9438290] load: 1.06 =C2=A0cmd: sysctl 9875 [tstile] 0.00=
 u 0.51s 0% 1308k</b></font></div><div class=3D"gmail_default" style=3D""><d=
 iv class=3D"gmail_default"><font face=3D"monospace" size=3D"1"><b><br></b><=
 /font></div></div><div class=3D"gmail_default" style=3D""><div class=3D"gma=
 il_default" style=3D""><font face=3D"monospace" style=3D"" size=3D"1"><b>&l=
 t; many same lines after with ^T, just diff systime &gt;</b></font></div></=
 div></blockquote><div class=3D"gmail_default" style=3D"font-family:arial,he=
 lvetica,sans-serif;font-size:small"><br class=3D"gmail-Apple-interchange-ne=
 wline"></div><div class=3D"gmail_default" style=3D"font-family:arial,helvet=
 ica,sans-serif;font-size:small">sysctl sits there forever in this &#39;tsti=
 le&#39; state.</div><div class=3D"gmail_default" style=3D"font-family:arial=
 ,helvetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_defaul=
 t" style=3D"font-family:arial,helvetica,sans-serif;font-size:small">Note th=
 at ^C and ^Z seem to only echo, and not get caught.=C2=A0 =C2=A0(I did not =
 report this as I cannot (yet) reliably reproduce this after I reboot and tr=
 y again; but has happened about 3x so far, reboot fixes it)</div><div class=
 =3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;font-siz=
 e:small"><br></div><div class=3D"gmail_default" style=3D"font-family:arial,=
 helvetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_default=
 " style=3D"font-family:arial,helvetica,sans-serif;font-size:small">I&#39;m =
 now wondering if this isn&#39;t some specific Raspberry Pi 4A hardware prob=
 lem -- because I, too, tried continuous=C2=A0&quot;swapctl -l&quot; loops w=
 ith no failures.</div><div class=3D"gmail_default" style=3D"font-family:ari=
 al,helvetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_defa=
 ult" style=3D"font-family:arial,helvetica,sans-serif;font-size:small">I&#39=
 ;m wondering if there&#39;s something about the way the sysctl database all=
 ows member access?=C2=A0 =C2=A0I don&#39;t think the swapctl program is at =
 fault -- I think it is getting something bogus and simply reporting the bog=
 osity.</div><div class=3D"gmail_default" style=3D"font-family:arial,helveti=
 ca,sans-serif;font-size:small"><br></div><div class=3D"gmail_default" style=
 =3D"font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div =
 class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;fon=
 t-size:small">Thanks for banging on this; these &#39;intermittents&#39; are=
  incredible pains.=C2=A0 If it&#39;s real, there is likely some &#39;race&#=
 39; somewhere.=C2=A0 =C2=A0If it&#39;s HW, well... sorry.=C2=A0 But, gosh, =
 everything else is rock-solid.</div><div class=3D"gmail_default" style=3D"f=
 ont-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class=
 =3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;font-siz=
 e:small">Let me collect more data; don&#39;t pull your hair out!</div><div =
 class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif;fon=
 t-size:small"><br></div><div class=3D"gmail_default" style=3D"font-family:a=
 rial,helvetica,sans-serif;font-size:small">Thanks again,</div><div class=3D=
 "gmail_default" style=3D"font-family:arial,helvetica,sans-serif;font-size:s=
 mall">Mike</div><div class=3D"gmail_default" style=3D"font-family:arial,hel=
 vetica,sans-serif;font-size:small"><br></div><div class=3D"gmail_default" s=
 tyle=3D"font-family:arial,helvetica,sans-serif;font-size:small"><br></div><=
 /div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">O=
 n Sun, Dec 26, 2021 at 11:10 PM matthew green &lt;<a href=3D"mailto:mrg@ete=
 rna.com.au">mrg%eterna.com.au@localhost</a>&gt; wrote:<br></div><blockquote class=3D"=
 gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20=
 4,204,204);padding-left:1ex">The following reply was made to PR bin/56550; =
 it has been noted by GNATS.<br>
 <br>
 From: matthew green &lt;<a href=3D"mailto:mrg%eterna.com.au@localhost"; target=3D"_bla=
 nk">mrg%eterna.com.au@localhost</a>&gt;<br>
 To: <a href=3D"mailto:gnats-bugs%netbsd.org@localhost"; target=3D"_blank">gnats-bugs@n=
 etbsd.org</a><br>
 Cc: <a href=3D"mailto:gnats-admin%netbsd.org@localhost"; target=3D"_blank">gnats-admin=
 @netbsd.org</a>, <a href=3D"mailto:netbsd-bugs%netbsd.org@localhost"; target=3D"_blank=
 ">netbsd-bugs%netbsd.org@localhost</a>, <a href=3D"mailto:mac%culver.net@localhost"; target=3D"_=
 blank">mac%culver.net@localhost</a><br>
 Subject: re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 !=
 =3D 3)<br>
 Date: Mon, 27 Dec 2021 18:05:59 +1100<br>
 <br>
 =C2=A0i really don&#39;t see how this can happen without something<br>
 =C2=A0doing swapctl(8) configuration in the middle.=C2=A0 i let a<br>
 =C2=A0system run swapctl -l a few 10,000s of times with one<br>
 =C2=A0device and 2 files configured, no problem.<br>
 <br>
 =C2=A0the code literally just returns &quot;uvmexp.nswapdev&quot;, and<br>
 =C2=A0that only changes when swap devices are added or removed,<br>
 =C2=A0so maybe something internal is weird (though the transient<br>
 =C2=A0nature of this is very odd.)<br>
 <br>
 =C2=A0more things you could do:=C2=A0 build a kernel with:<br>
 <br>
 =C2=A0 =C2=A0 options UVMHIST<br>
 <br>
 =C2=A0and when you observe the failure, run &quot;vmstat -u pdhist&quot;,<b=
 r>
 =C2=A0and see what these entries show:<br>
 <br>
 =C2=A0---<br>
 =C2=A0if (SCARG(uap, cmd) =3D=3D SWAP_NSWAP) {<br>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const int nswapdev =3D uvmexp.nswapdev;<b=
 r>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0UVMHIST_LOG(pdhist, &quot;&lt;- done SWAP=
 _NSWAP=3D%jd&quot;, nswapdev,<br>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00, 0, 0);<br>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*retval =3D nswapdev;<br>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;<br>
 =C2=A0---<br>
 <br>
 =C2=A0you could also patch uvm_swap_stats() to print something if<br>
 =C2=A0it gets to thsi line:<br>
 <br>
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*retval =3D count;<br>
 <br>
 =C2=A0right before the return, and check that &quot;count =3D=3D misc&quot;=
 ,<br>
 =C2=A0printing them both if mismatched.=C2=A0 if this happens, the<br>
 =C2=A0next step would be to look deeper into double-loop in this<br>
 =C2=A0function and see what it is doing wrong.<br>
 <br>
 <br>
 =C2=A0.mrg.<br>
 <br>
 </blockquote></div>
 
 --0000000000001cb4af05d42745d5--
 



Home | Main Index | Thread Index | Old Index