pkgsrc-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
trip report: figuring out where python cpu time is going, libcares 1.34.4 is bad
tl:dr: If you have any process with unexplained 100% cpu, and you have
libcares 1.34.4, upgrade to 1.34.5.
I solved my own problem, but starting to write this was helpful.
Sending it anyway as it might be helpful to others.
Over the last month or so, Home Assistant (HA, a python program that is
heavily async, and generally works well on NetBSD) is using 100% cpu
time, but continuing to function fine.
I guess therefore that there is something that should be
blocking/waiting that is instead busy-looping, but I emphasize guess.
I tried the HA profiler, but it showed in 60s that ~all was in
~:0(<method 'control' of 'select.kqueue' objects>)
with 67 calls, which seems fine. Other inspections of python don't
turn up trouble.
There are 16 threads. I am guessing most are just waiting and there
because some integration (plugin for a type of device, more or less)
might do blocking calls, to avoid the system blocking.
Running ps with 's' shows 1 thread with ~all the cpu time used. However
'ps auxs' does not show per-LWP times, showing TIME instead of LTIME.
1039 6487 1 27767 3382 16 30 0 922800 201656 - O pts/3 23:50.57 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 1833 16 85 0 922800 201656 kqueue I pts/3 0:00.00 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 3561 16 85 0 922800 201656 psem S pts/3 0:00.18 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 4900 16 85 0 922800 201656 psem I pts/3 0:00.00 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 1954 16 85 0 922800 201656 psem I pts/3 0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 1893 16 85 0 922800 201656 psem I pts/3 0:00.00 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 1888 16 85 0 922800 201656 psem I pts/3 0:01.05 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 1890 16 85 0 922800 201656 psem I pts/3 0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 4955 16 85 0 922800 201656 psem I pts/3 0:00.01 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 4660 16 85 0 922800 201656 psem I pts/3 0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 1889 16 85 0 922800 201656 psem I pts/3 0:00.01 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 5723 16 85 0 922800 201656 psem I pts/3 0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 4959 16 85 0 922800 201656 psem I pts/3 0:00.02 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 4956 16 85 0 922800 201656 psem I pts/3 0:00.02 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 3421 16 85 0 922800 201656 psem I pts/3 0:00.01 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
1039 6487 1 27767 6487 16 85 0 922800 201656 kqueue S pts/3 0:01.80 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check
so I've found my problem lwp. Firing up gdb and attaching, I have to
scroll through many pages of loading libs, because HA links in a vast
amount. But then, it's on the right LWP:
[Switching to LWP 3382 of process 6487]
0x0000791ac462b0cc in ares_thread_cond_timedwait () from /usr/pkg/lib/libcares.so.2
(gdb) bt
#0 0x0000791ac462b0cc in ares_thread_cond_timedwait () from /usr/pkg/lib/libcares.so.2
#1 0x0000791ac461ebc2 in ?? () from /usr/pkg/lib/libcares.so.2
#2 0x0000791ad4c0c89f in ?? () from /usr/lib/libpthread.so.1
#3 0x0000791ad3a930e0 in ?? () from /usr/lib/libc.so.12
#4 0x0000000000000000 in ?? ()
and sure enough there is a bugfix in libcares in pkgsrc-HEAD, not in
2025Q1, about excessive CPU time on !linux.
Updating to libcares 1.34.5 fixed the problem.
Maybe this started when I updated to 2025Q1 and the bug was newly in
1.34.4 and I didn't notice.
Home |
Main Index |
Thread Index |
Old Index