pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

trip report: figuring out where python cpu time is going, libcares 1.34.4 is bad



tl:dr: If you have any process with unexplained 100% cpu, and you have
libcares 1.34.4, upgrade to 1.34.5.

I solved my own problem, but starting to write this was helpful.
Sending it anyway as it might be helpful to others.


Over the last month or so, Home Assistant (HA, a python program that is
heavily async, and generally works well on NetBSD) is using 100% cpu
time, but continuing to function fine.

I guess therefore that there is something that should be
blocking/waiting that is instead busy-looping, but I emphasize guess.

I tried the HA profiler, but it showed in 60s that ~all was in

  ~:0(<method 'control' of 'select.kqueue' objects>)

with 67 calls, which seems fine.   Other inspections of python don't
turn up trouble.

There are 16 threads.  I am guessing most are just waiting and there
because some integration (plugin for a type of device, more or less)
might do blocking calls, to avoid the system blocking.

Running ps with 's' shows 1 thread with ~all the cpu time used.  However
'ps auxs' does not show per-LWP times, showing TIME instead of LTIME.

1039 6487    1 27767 3382   16  30  0 922800 201656 -      O    pts/3 23:50.57 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 1833   16  85  0 922800 201656 kqueue I    pts/3  0:00.00 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 3561   16  85  0 922800 201656 psem   S    pts/3  0:00.18 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 4900   16  85  0 922800 201656 psem   I    pts/3  0:00.00 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 1954   16  85  0 922800 201656 psem   I    pts/3  0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 1893   16  85  0 922800 201656 psem   I    pts/3  0:00.00 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 1888   16  85  0 922800 201656 psem   I    pts/3  0:01.05 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 1890   16  85  0 922800 201656 psem   I    pts/3  0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 4955   16  85  0 922800 201656 psem   I    pts/3  0:00.01 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 4660   16  85  0 922800 201656 psem   I    pts/3  0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 1889   16  85  0 922800 201656 psem   I    pts/3  0:00.01 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 5723   16  85  0 922800 201656 psem   I    pts/3  0:00.03 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 4959   16  85  0 922800 201656 psem   I    pts/3  0:00.02 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 4956   16  85  0 922800 201656 psem   I    pts/3  0:00.02 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 3421   16  85  0 922800 201656 psem   I    pts/3  0:00.01 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 
1039 6487    1 27767 6487   16  85  0 922800 201656 kqueue S    pts/3  0:01.80 /var/hass/HASS313/bin/python ./HASS313/bin/hass --ignore-os-check 

so I've found my problem lwp. Firing up gdb and attaching, I have to
scroll through many pages of loading libs, because HA links in a vast
amount.  But then, it's on the right LWP:

  [Switching to LWP 3382 of process 6487]
  0x0000791ac462b0cc in ares_thread_cond_timedwait () from /usr/pkg/lib/libcares.so.2
  (gdb) bt
  #0  0x0000791ac462b0cc in ares_thread_cond_timedwait () from /usr/pkg/lib/libcares.so.2
  #1  0x0000791ac461ebc2 in ?? () from /usr/pkg/lib/libcares.so.2
  #2  0x0000791ad4c0c89f in ?? () from /usr/lib/libpthread.so.1
  #3  0x0000791ad3a930e0 in ?? () from /usr/lib/libc.so.12
  #4  0x0000000000000000 in ?? ()

and sure enough there is a bugfix in libcares in pkgsrc-HEAD, not in
2025Q1, about excessive CPU time on !linux.

Updating to libcares 1.34.5 fixed the problem.

Maybe this started when I updated to 2025Q1 and the bug was newly in
1.34.4 and I didn't notice.


Home | Main Index | Thread Index | Old Index