NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/53591: [system] process uses >400% CPU on idle machine
The following reply was made to PR kern/53591; it has been noted by GNATS.
From: Lars Reichardt <lars%paradoxon.info@localhost>
To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/53591: [system] process uses >400% CPU on idle machine
Date: Tue, 11 Sep 2018 12:09:35 +0200
On 9/11/18 10:50 AM, Andreas Gustafsson wrote:
>> Number: 53591
>> Category: kern
>> Synopsis: [system] process uses >400% CPU on idle machine
>> Confidential: no
>> Severity: serious
>> Priority: high
>> Responsible: kern-bug-people
>> State: open
>> Class: sw-bug
>> Submitter-Id: net
>> Arrival-Date: Tue Sep 11 08:50:00 +0000 2018
>> Originator: Andreas Gustafsson
>> Release: NetBSD 8.0
>> Organization:
>> Environment:
> System: NetBSD guido
> Architecture: x86_64
> Machine: amd64
>> Description:
> My 12-core HP DL360 G7 system running NetBSD/amd64 8.0 has now somehow
> gotten itself into a state where the [system] process is using >400%
> CPU even though the system is idle. "top" shows:
>
> load averages: 0.00, 0.00, 0.80; up 1+18:48:30
> 51 processes: 45 sleeping, 4 stopped, 2 on CPU
> CPU states: 0.0% user, 0.0% nice, 34.8% system, 0.0% interrupt, 65.1% idle
> Memory: 20G Act, 10G Inact, 348K Wired, 33M Exec, 4875M File, 62M Free
> Swap:
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 0 root 0 0 0K 133M CPU/11 507:36 0.00% 353% [system]
> 484 pgsql 85 0 77M 4572K select/7 2:45 0.00% 0.00% postgres
> 6099 gson 85 0 95M 3020K select/6 0:58 0.00% 0.00% sshd
>
> Pressing the "t" key shows that the kernel threads eating CPU are
> the pgdaemon and xcall threads:
>
> load averages: 0.00, 0.00, 0.76; up 1+18:49:12
> 217 threads: 49 idle, 1 runnable, 146 sleeping, 8 stopped, 1 zombie, 12 on CPU
> CPU states: 0.0% user, 0.0% nice, 35.8% system, 0.0% interrupt, 64.1% idle
> Memory: 20G Act, 10G Inact, 348K Wired, 33M Exec, 4875M File, 62M Free
> Swap:
>
> PID LID USERNAME PRI STATE TIME WCPU CPU NAME COMMAND
> 0 7 root 127 xcall/0 43:21 61.96% 61.96% xcall/0 [system]
> 0 22 root 127 xcall/1 42:08 47.22% 47.22% xcall/1 [system]
> 0 28 root 127 xcall/2 39:35 42.97% 42.97% xcall/2 [system]
> 0 34 root 127 RUN/3 34:54 31.59% 31.59% xcall/3 [system]
> 0 52 root 127 xcall/6 29:36 30.96% 30.96% xcall/6 [system]
> 0 58 root 127 xcall/7 28:53 29.88% 29.88% xcall/7 [system]
> 0 70 root 127 xcall/9 26:41 29.69% 29.69% xcall/9 [system]
> 0 64 root 127 xcall/8 26:46 29.49% 29.49% xcall/8 [system]
> 0 156 root 126 xclocv/1 92:15 29.44% 29.44% pgdaemon [system]
> 0 82 root 127 xcall/11 24:05 28.47% 28.47% xcall/11 [system]
> 0 46 root 127 xcall/5 31:20 28.12% 28.12% xcall/5 [system]
> 0 40 root 127 xcall/4 30:48 25.29% 25.29% xcall/4 [system]
> 0 76 root 127 xcall/10 24:03 25.05% 25.05% xcall/10 [system]
> 0 157 root 124 syncer/4 22:45 0.00% 0.00% ioflush [system]
> 0 158 root 125 aiodon/9 5:12 0.00% 0.00% aiodoned [system]
> 0 84 root 96 ipmicm/1 5:04 0.00% 0.00% ipmi [system]
> 484 1 pgsql 85 select/2 2:45 0.00% 0.00% - postgres
> 0 9 root 125 vdrain/1 1:17 0.00% 0.00% vdrain [system]
> 0 159 root 123 physio/0 1:12 0.00% 0.00% physiod [system]
>
> Output from "vmstat 1":
>
> procs memory page disks faults cpu
> r b avm fre flt re pi po fr sr l0 s0 in sy cs us sy id
> 1 8 21024468 74920 15313 1 0 0 191 532 79 44 170 11879 38629 3 3 93
> 0 8 21024468 74920 1 0 0 0 0 0 0 0 8 121 960529 0 36 64
> 0 8 21024468 74668 613 0 0 0 0 0 0 3 27 316 951463 0 37 63
> 0 8 21024468 74672 0 0 0 0 0 0 0 0 3 25 958574 0 37 63
> 0 8 21024468 74672 0 0 0 0 0 0 0 0 2 28 962733 0 35 65
> 0 8 21024468 74940 0 0 0 0 0 0 0 0 2 25 957158 0 36 64
> 0 8 21024468 74940 0 0 0 0 0 0 0 0 4 106 953688 0 37 63
>
> I will try to avoid rebooting for 24 hours in case someone wants me to
> run other diagnostics.
>
>> How-To-Repeat:
> Don't know, this has only happened once so far. I had been using dtrace,
> so maybe that's what triggered it. Or not.
>
>> Fix:
How much memory does the machine have, maybe some pools (with larger the
PAGE_SIZE allocators) have eaten all the kmem_va space?
What does vmstat -mv show?
Home |
Main Index |
Thread Index |
Old Index