NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/53591: [system] process uses >400% CPU on idle machine



>Number:         53591
>Category:       kern
>Synopsis:       [system] process uses >400% CPU on idle machine
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Sep 11 08:50:00 +0000 2018
>Originator:     Andreas Gustafsson
>Release:        NetBSD 8.0
>Organization:
>Environment:
System: NetBSD guido
Architecture: x86_64
Machine: amd64
>Description:

My 12-core HP DL360 G7 system running NetBSD/amd64 8.0 has now somehow
gotten itself into a state where the [system] process is using >400%
CPU even though the system is idle.  "top" shows:

  load averages:  0.00,  0.00,  0.80;               up 1+18:48:30
  51 processes: 45 sleeping, 4 stopped, 2 on CPU
  CPU states:  0.0% user,  0.0% nice, 34.8% system,  0.0% interrupt, 65.1% idle
  Memory: 20G Act, 10G Inact, 348K Wired, 33M Exec, 4875M File, 62M Free
  Swap: 

    PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
      0 root       0    0     0K  133M CPU/11   507:36  0.00%   353% [system]
    484 pgsql     85    0    77M 4572K select/7   2:45  0.00%  0.00% postgres
   6099 gson      85    0    95M 3020K select/6   0:58  0.00%  0.00% sshd

Pressing the "t" key shows that the kernel threads eating CPU are
the pgdaemon and xcall threads:

  load averages:  0.00,  0.00,  0.76;               up 1+18:49:12
  217 threads: 49 idle, 1 runnable, 146 sleeping, 8 stopped, 1 zombie, 12 on CPU
  CPU states:  0.0% user,  0.0% nice, 35.8% system,  0.0% interrupt, 64.1% idle
  Memory: 20G Act, 10G Inact, 348K Wired, 33M Exec, 4875M File, 62M Free
  Swap: 

    PID   LID USERNAME PRI STATE      TIME   WCPU    CPU NAME      COMMAND
      0     7 root     127 xcall/0   43:21 61.96% 61.96% xcall/0   [system]
      0    22 root     127 xcall/1   42:08 47.22% 47.22% xcall/1   [system]
      0    28 root     127 xcall/2   39:35 42.97% 42.97% xcall/2   [system]
      0    34 root     127 RUN/3     34:54 31.59% 31.59% xcall/3   [system]
      0    52 root     127 xcall/6   29:36 30.96% 30.96% xcall/6   [system]
      0    58 root     127 xcall/7   28:53 29.88% 29.88% xcall/7   [system]
      0    70 root     127 xcall/9   26:41 29.69% 29.69% xcall/9   [system]
      0    64 root     127 xcall/8   26:46 29.49% 29.49% xcall/8   [system]
      0   156 root     126 xclocv/1  92:15 29.44% 29.44% pgdaemon  [system]
      0    82 root     127 xcall/11  24:05 28.47% 28.47% xcall/11  [system]
      0    46 root     127 xcall/5   31:20 28.12% 28.12% xcall/5   [system]
      0    40 root     127 xcall/4   30:48 25.29% 25.29% xcall/4   [system]
      0    76 root     127 xcall/10  24:03 25.05% 25.05% xcall/10  [system]
      0   157 root     124 syncer/4  22:45  0.00%  0.00% ioflush   [system]
      0   158 root     125 aiodon/9   5:12  0.00%  0.00% aiodoned  [system]
      0    84 root      96 ipmicm/1   5:04  0.00%  0.00% ipmi      [system]
    484     1 pgsql     85 select/2   2:45  0.00%  0.00% -         postgres
      0     9 root     125 vdrain/1   1:17  0.00%  0.00% vdrain    [system]
      0   159 root     123 physio/0   1:12  0.00%  0.00% physiod   [system]

Output from "vmstat 1":

   procs    memory      page                       disks   faults      cpu
   r b      avm    fre  flt  re  pi   po   fr   sr l0 s0   in   sy  cs us sy id
   1 8 21024468  74920 15313  1   0    0  191  532 79 44  170 11879 38629 3 3 93
   0 8 21024468  74920    1   0   0    0    0    0  0  0    8  121 960529 0 36 64
   0 8 21024468  74668  613   0   0    0    0    0  0  3   27  316 951463 0 37 63
   0 8 21024468  74672    0   0   0    0    0    0  0  0    3   25 958574 0 37 63
   0 8 21024468  74672    0   0   0    0    0    0  0  0    2   28 962733 0 35 65
   0 8 21024468  74940    0   0   0    0    0    0  0  0    2   25 957158 0 36 64
   0 8 21024468  74940    0   0   0    0    0    0  0  0    4  106 953688 0 37 63

I will try to avoid rebooting for 24 hours in case someone wants me to
run other diagnostics.

>How-To-Repeat:

Don't know, this has only happened once so far.  I had been using dtrace,
so maybe that's what triggered it.  Or not.

>Fix:



Home | Main Index | Thread Index | Old Index