Subject: Re: How to debug pthreads as end-user ?
To: Stephen Ma <>
From: Martin Weber <>
List: current-users
Date: 08/15/2003 18:44:38
On Fri, Aug 15, 2003 at 08:01:53AM -0700, Stephen Ma wrote:
> >>>>> "Martin" == Martin Weber <> writes:
> Martin> Does this help anyone ? what else ? what would help instead ?

Sorry for the long post, just want to make sure you know how
exactly I shot in my foot :)

> This could be the problem with page faults happening during the SA
> upcall that Stephan Uphoff mentioned on
> earlier. Unfortunately, the page fault problem needs some rather
> delicate changes to the kernel SA code. If you want to know for sure,
> try using the attached program (run it with an argument pointing to
> the that phoenix/firebird/whatever is linked
> against). This'll lock the libpthreads into memory, and should avoid
> the page-fault (unfortunately, it can only lock the code into memory,
> so a data page could still be paged out, but that is less likely).
> - S
> #include <stdio.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> int
> main(int argc, char **argv)
> {
>         int fd;
> 	struct stat st;
> 	void *addr;
> 	if (argc != 2) {
> 		fprintf(stderr, "Usage: %s <file>\n", argv[0]);
> 		exit(1);
> 	}
> 	if ((fd = open(argv[1], O_RDONLY, 0)) == -1) {
> 		fprintf(stderr, "Unable to open %s ", argv[1]);
> 		perror("open()");
> 		exit(1);
> 	}
> 	fstat(fd, &st);
> 	addr = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
> 	mlock(addr, st.st_size);
> 	printf("Locked %d bytes of %s at 0x%08x\n", (int) st.st_size, argv[1], (int) addr);
> 	for (;;) {
> 		sleep(1);
> 	}
>         return 0;
> }

Oookay. So I started xmms and phoenix, and some memory and cpu
eaters, and observed that neither xmms nor phoenix crashed. Fine,
I didn't have the chance to test it for toooo long, because after
some minutes I thought fine, let's kill the cpu- and mem- eaters
again and see what phoenix does. Well, it spun there eating all
the available cpu (I'm pretty sure its own pages were swapped out
somewhere in the middle). Then I made the great error :) me dummy
started gdb and attached to phoenix. That was an error :> THE WHOLE
SYSTEM HUNG ROCK SOLID. That is. Xmms was continuing on issuing a
one second frame again and again (the same one), but the system
itself hung. No network, console, reaction on anything I did (until
I hit the power button).

May this be a lesson to me: Stop (trying to) debug(ging) things
without directions (where are they?!) from the ppl who know the
internals :)