Re: kern/53591: [system] process uses >400% CPU on idle machine

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,gson%gson.org@localhost (Andreas Gustafsson)
Subject: Re: kern/53591: [system] process uses >400% CPU on idle machine
From: Lars Reichardt <lars%paradoxon.info@localhost>
Date: Tue, 11 Sep 2018 10:10:02 +0000 (UTC)

The following reply was made to PR kern/53591; it has been noted by GNATS.

From: Lars Reichardt <lars%paradoxon.info@localhost>
To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/53591: [system] process uses >400% CPU on idle machine
Date: Tue, 11 Sep 2018 12:09:35 +0200

 On 9/11/18 10:50 AM, Andreas Gustafsson wrote:
 >> Number:         53591
 >> Category:       kern
 >> Synopsis:       [system] process uses >400% CPU on idle machine
 >> Confidential:   no
 >> Severity:       serious
 >> Priority:       high
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Tue Sep 11 08:50:00 +0000 2018
 >> Originator:     Andreas Gustafsson
 >> Release:        NetBSD 8.0
 >> Organization:
 >> Environment:
 > System: NetBSD guido
 > Architecture: x86_64
 > Machine: amd64
 >> Description:
 > My 12-core HP DL360 G7 system running NetBSD/amd64 8.0 has now somehow
 > gotten itself into a state where the [system] process is using >400%
 > CPU even though the system is idle.  "top" shows:
 >
 >    load averages:  0.00,  0.00,  0.80;               up 1+18:48:30
 >    51 processes: 45 sleeping, 4 stopped, 2 on CPU
 >    CPU states:  0.0% user,  0.0% nice, 34.8% system,  0.0% interrupt, 65.1% idle
 >    Memory: 20G Act, 10G Inact, 348K Wired, 33M Exec, 4875M File, 62M Free
 >    Swap:
 >
 >      PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
 >        0 root       0    0     0K  133M CPU/11   507:36  0.00%   353% [system]
 >      484 pgsql     85    0    77M 4572K select/7   2:45  0.00%  0.00% postgres
 >     6099 gson      85    0    95M 3020K select/6   0:58  0.00%  0.00% sshd
 >
 > Pressing the "t" key shows that the kernel threads eating CPU are
 > the pgdaemon and xcall threads:
 >
 >    load averages:  0.00,  0.00,  0.76;               up 1+18:49:12
 >    217 threads: 49 idle, 1 runnable, 146 sleeping, 8 stopped, 1 zombie, 12 on CPU
 >    CPU states:  0.0% user,  0.0% nice, 35.8% system,  0.0% interrupt, 64.1% idle
 >    Memory: 20G Act, 10G Inact, 348K Wired, 33M Exec, 4875M File, 62M Free
 >    Swap:
 >
 >      PID   LID USERNAME PRI STATE      TIME   WCPU    CPU NAME      COMMAND
 >        0     7 root     127 xcall/0   43:21 61.96% 61.96% xcall/0   [system]
 >        0    22 root     127 xcall/1   42:08 47.22% 47.22% xcall/1   [system]
 >        0    28 root     127 xcall/2   39:35 42.97% 42.97% xcall/2   [system]
 >        0    34 root     127 RUN/3     34:54 31.59% 31.59% xcall/3   [system]
 >        0    52 root     127 xcall/6   29:36 30.96% 30.96% xcall/6   [system]
 >        0    58 root     127 xcall/7   28:53 29.88% 29.88% xcall/7   [system]
 >        0    70 root     127 xcall/9   26:41 29.69% 29.69% xcall/9   [system]
 >        0    64 root     127 xcall/8   26:46 29.49% 29.49% xcall/8   [system]
 >        0   156 root     126 xclocv/1  92:15 29.44% 29.44% pgdaemon  [system]
 >        0    82 root     127 xcall/11  24:05 28.47% 28.47% xcall/11  [system]
 >        0    46 root     127 xcall/5   31:20 28.12% 28.12% xcall/5   [system]
 >        0    40 root     127 xcall/4   30:48 25.29% 25.29% xcall/4   [system]
 >        0    76 root     127 xcall/10  24:03 25.05% 25.05% xcall/10  [system]
 >        0   157 root     124 syncer/4  22:45  0.00%  0.00% ioflush   [system]
 >        0   158 root     125 aiodon/9   5:12  0.00%  0.00% aiodoned  [system]
 >        0    84 root      96 ipmicm/1   5:04  0.00%  0.00% ipmi      [system]
 >      484     1 pgsql     85 select/2   2:45  0.00%  0.00% -         postgres
 >        0     9 root     125 vdrain/1   1:17  0.00%  0.00% vdrain    [system]
 >        0   159 root     123 physio/0   1:12  0.00%  0.00% physiod   [system]
 >
 > Output from "vmstat 1":
 >
 >     procs    memory      page                       disks   faults      cpu
 >     r b      avm    fre  flt  re  pi   po   fr   sr l0 s0   in   sy  cs us sy id
 >     1 8 21024468  74920 15313  1   0    0  191  532 79 44  170 11879 38629 3 3 93
 >     0 8 21024468  74920    1   0   0    0    0    0  0  0    8  121 960529 0 36 64
 >     0 8 21024468  74668  613   0   0    0    0    0  0  3   27  316 951463 0 37 63
 >     0 8 21024468  74672    0   0   0    0    0    0  0  0    3   25 958574 0 37 63
 >     0 8 21024468  74672    0   0   0    0    0    0  0  0    2   28 962733 0 35 65
 >     0 8 21024468  74940    0   0   0    0    0    0  0  0    2   25 957158 0 36 64
 >     0 8 21024468  74940    0   0   0    0    0    0  0  0    4  106 953688 0 37 63
 >
 > I will try to avoid rebooting for 24 hours in case someone wants me to
 > run other diagnostics.
 >
 >> How-To-Repeat:
 > Don't know, this has only happened once so far.  I had been using dtrace,
 > so maybe that's what triggered it.  Or not.
 >
 >> Fix:
 
 How much memory does the machine have, maybe some pools (with larger the 
 PAGE_SIZE allocators) have eaten all the kmem_va space?
 
 What does vmstat -mv show?

Prev by Date: Re: kern/53591: [system] process uses >400% CPU on idle machine
Next by Date: Re: kern/53591: [system] process uses >400% CPU on idle machine
Previous by Thread: Re: kern/53591: [system] process uses >400% CPU on idle machine
Next by Thread: Re: kern/53591: [system] process uses >400% CPU on idle machine
Indexes:

Home | Main Index | Thread Index | Old Index