NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-hppa/56849: Wacko kernel memory accounting in current/hppa



>Number:         56849
>Category:       port-hppa
>Synopsis:       Wacko kernel memory accounting in current/hppa
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-hppa-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun May 22 18:15:00 +0000 2022
>Originator:     Tom Lane
>Release:        HEAD/202205210400Z
>Organization:
PostgreSQL Global Development Group
>Environment:
NetBSD sss2.sss.pgh.pa.us 9.99.96 NetBSD 9.99.96 (TGL) #0: Sat May 21 16:50:57 EDT 2022  tgl%nuc1.sss.pgh.pa.us@localhost:/home/tgl/netbsd-H-202205210400Z/obj.hppa/sys/arch/hppa/compile/TGL hppa

This kernel is custom only to the extent of having a particular root device selection in its config, so I don't have to intervene manually when netbooting (root filesystem is on a USB drive that the boot ROM doesn't know how to boot from)
>Description:
During and after a fairly long, disk-access-intensive test suite, I observe completely wacko reports from top(1) and ps(1) about the kernel's memory consumption.  With the machine sitting
idle post-run, top says

load averages:  0.00,  0.00,  0.00;               up 0+19:01:17        12:37:47
19 processes: 18 sleeping, 1 on CPU
CPU states:  0.0% user,  0.0% nice,  1.5% system,  0.0% interrupt, 98.5% idle
Memory: 38M Act, 4852K Inact, 26M Wired, 8456K Exec, 32M File, 296M Free
Swap: 1024M Total, 9068K Used, 1015M Free

  PID USERNAME PRI NICE   SIZE   RES STATE       TIME   WCPU    CPU COMMAND
    0 root     125    0     0K   79G vdrain     60:25  0.00%  0.00% [system]
 ...

"79G"?  This machine only has 512M RAM, plus the 1024M swap partition.
The "Memory:" line is less obviously implausible, but it's still not right, because those numbers only sum to about 405M.

"ps auxww" seems to be looking at the same incorrect estimate, though
it presents it much differently:

USER        PID %CPU   %MEM   VSZ     RSS TTY   STAT STARTED     TIME COMMAND
root          0  0.0 15781.9     0 3050660 ?     DKl   5:37PM 60:26.58 [system]
...

... or wait, that %MEM estimate tracks pretty closely with 79G, but
that RSS value only means 3G doesn't it?

These numbers appear to have crept up slowly since boot but then
stabilized at what I'm showing here.  I'm guessing some sort of
pseudo-leak in memory accounting, but not actual memory consumption.
This test suite is extremely file-access-heavy, so something wrong in
vnode accounting could fit the facts perhaps.  The suite ran a good
deal slower than I was hoping for, almost double the time it takes
under HP-UX on the same hardware, so I'm wondering if this bogus
accounting is having real performance effects somewhere.

For comparison's sake I looked at a nearby machine running 9.2/amd64,
and there, top and ps agree that the kernel is using about 26M which
seems generally sane.

>How-To-Repeat:
I was running Postgres "make check-world", but I doubt it's specific to this exact workload.
>Fix:
Nick Hudson speculates there's something wrong in sys/arch/hppa/hppa/pmap.c



Home | Main Index | Thread Index | Old Index