NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 != 3)



Hi Matthew,

I will do both of your suggestions (custom kernel config, patch uvm_swap_stats() ).

However, I noticed something unusual yesterday with sysctl:  I have the line "sysctl machdep.cpu.frequency.current" in my .zshrc file so I should get my cpu's current frequency, and that works the 1st time I log in.  However, often, when I start a 2nd (or 3rd or...) ssh session, the machine hangs that login at that line; ^T shows this:

< many same lines before with ^T, just diff systime >

[ 108947.5647601] load: 1.07  cmd: sysctl 9875 [tstile] 0.00u 0.51s 0% 1308k
[ 108947.5977861] load: 1.07  cmd: sysctl 9875 [tstile] 0.00u 0.51s 0% 1308k
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z
[ 108955.9438290] load: 1.06  cmd: sysctl 9875 [tstile] 0.00u 0.51s 0% 1308k

< many same lines after with ^T, just diff systime >

sysctl sits there forever in this 'tstile' state.

Note that ^C and ^Z seem to only echo, and not get caught.   (I did not report this as I cannot (yet) reliably reproduce this after I reboot and try again; but has happened about 3x so far, reboot fixes it)


I'm now wondering if this isn't some specific Raspberry Pi 4A hardware problem -- because I, too, tried continuous "swapctl -l" loops with no failures.

I'm wondering if there's something about the way the sysctl database allows member access?   I don't think the swapctl program is at fault -- I think it is getting something bogus and simply reporting the bogosity.


Thanks for banging on this; these 'intermittents' are incredible pains.  If it's real, there is likely some 'race' somewhere.   If it's HW, well... sorry.  But, gosh, everything else is rock-solid.

Let me collect more data; don't pull your hair out!

Thanks again,
Mike



On Sun, Dec 26, 2021 at 11:10 PM matthew green <mrg%eterna.com.au@localhost> wrote:
The following reply was made to PR bin/56550; it has been noted by GNATS.

From: matthew green <mrg%eterna.com.au@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, mac%culver.net@localhost
Subject: re: kern/56550: swapctl: SWAP_STATS different to SWAP_NSWAP (1 != 3)
Date: Mon, 27 Dec 2021 18:05:59 +1100

 i really don't see how this can happen without something
 doing swapctl(8) configuration in the middle.  i let a
 system run swapctl -l a few 10,000s of times with one
 device and 2 files configured, no problem.

 the code literally just returns "uvmexp.nswapdev", and
 that only changes when swap devices are added or removed,
 so maybe something internal is weird (though the transient
 nature of this is very odd.)

 more things you could do:  build a kernel with:

    options UVMHIST

 and when you observe the failure, run "vmstat -u pdhist",
 and see what these entries show:

 ---
 if (SCARG(uap, cmd) == SWAP_NSWAP) {
         const int nswapdev = uvmexp.nswapdev;
         UVMHIST_LOG(pdhist, "<- done SWAP_NSWAP=%jd", nswapdev,
             0, 0, 0);
         *retval = nswapdev;
         return 0;
 ---

 you could also patch uvm_swap_stats() to print something if
 it gets to thsi line:

         *retval = count;

 right before the return, and check that "count == misc",
 printing them both if mismatched.  if this happens, the
 next step would be to look deeper into double-loop in this
 function and see what it is doing wrong.


 .mrg.



Home | Main Index | Thread Index | Old Index