Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 10.0 - configure test process hang



    Date:        Fri, 8 Nov 2024 16:51:43 +0100
    From:        Riccardo Mottola <riccardo.mottola%libero.it@localhost>
    Message-ID:  <707ec082-002d-5678-c39e-263f5d0168db%libero.it@localhost>


  | Now the box did build for some time and in gdk it hangs. I have
  |
  | [56/255] Compiling C object gdk-pixbuf/libpixbufloader-qtif.so.p/io-qtif.c.o
  | ../gdk-pixbuf/io-qtif.c: In function
  | 'gdk_pixbuf__qtif_image_load_increment':
  | ../gdk-pixbuf/io-qtif.c:441:33: warning: cast increases required
  | alignment of target type [-Wcast-align]
  |   441 |                 QtHeader *hdr = (QtHeader *)context->header_buffer;
  |       |                                 ^
  | [248/255] Generating docs/man-gdk-pixbuf-query-loade... custom command
  | (wrapped by meson to capture output

That warning is harmless provided that header_buffer is in fact sufficiently
aligned to hold a QtHeader (if not, you'll likely see segmentation violations
when the result is run on a sparc).


  | and sits here forever.. I check, CPUs are idle

From your ps output (which must have also had 'w' which can be useful,
but can also make things hard to read because of the long lines it can
produce) the interesting processes seem to be:

  |    0  6057 29639 22604  80  0 10916 3628 select I+   pts/1 0:07.52 ninja
  | -j 2 -C output

"select" means it is waiting for something to happen, perhaps to read
some output.

Its ancestor chain (parent, grandparent, ...) are pids 29639, 24416, 29612,
1736, 7001, 28722, 29395, 16410, 11612, 22302, 2638, 2040, 1949

all of which except the last two are in "wait" state, waiting for a
child ("the" child I suspect) to complete.   The last two are so far
up the food chain that even though they're not actually doing a wait,
that's probably close enough to what they are doing (via a different
mechanism perhaps).

That leaves

  |    0  1406     1     0  85  0  5872  392 ttyraw Is+  ttya  0:00.11
  | /usr/libexec/getty suncons constty

which is clearly irrelevant (waiting for someone to login on the console)

and

  |    0 17076  6057 51657  32  0 19524 3292 parked Il   pts/1 0:04.96
  | /disk2/pkg-workdir/graphics/gdk-pixbuf2/work/.buildlink/bin/glib-compile-resources
  | ../tests/resources.gresource.xml (glib-compile-res)

and

  |    0 27348 17076     0   0  0     0    0 -      Z    pts/1 0:00.00
  | (gdk-pixbuf-pixda)

The last of those (27348) is a zombie, it finished its work and exited.
Its parent is the other odd one (17076), it should be waiting for that dead
child and then doing something I presume, but is instead parked (a thread
just waiting to be told what to do).   There might be another thread, which
ps didn't pick to display for that process, which is in some other state.

That could be a kernel issue, but could also just as easily be a logic
(locking, inter-thread communication, ...) bug in the application, but that
one is where you need to be concentrating.  Include "-s" in the ps info,
(so you can see all the threads) and -p 17076 (adjust the pid as needed if
you are doing this with a different process tree, rather than that same one
just sitting there still) so you only get the threads that are a part of
that process.

After that you might need to try forcing core dumps, or doing kernel
stack tracing, and you've certainly moved outside any area I can help
with (I don't know a lot about threads, and nothing about getting stack
traces of threads inside the kernel, especially on a sparc, to see what
some thread that isn't just "parked" (if all are, it is likely to be an
application bug, pure and simple) is waiting upon.

kre




Home | Main Index | Thread Index | Old Index