Subject: Re: How to debug pthreads as end-user ?
To: Martin Weber <Ephaeton@gmx.net>
From: Stephan Uphoff <ups@stups.com>
List: current-users
Date: 08/15/2003 20:25:42
Hi,

The following i386 only patch should help if you really encounter
the page fault problems.

It basically blocks all upcalls for page faults from userspace.
Let me know if it helps.

> Then I made the great error :) me dummy
> started gdb and attached to phoenix. That was an error :> THE WHOLE
> SYSTEM HUNG ROCK SOLID. 

Oops - I guess this makes it my problem :-(
How recent is your kernel ?
( Some debugging problems were fixed at August 11 ) 

Should the system lock up again I would be really interested in a stack 
backtrace.

	Stephan


Index: sys/arch/i386/i386/trap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/i386/i386/trap.c,v
retrieving revision 1.182
diff -u -r1.182 trap.c
--- sys/arch/i386/i386/trap.c   2003/06/23 11:01:21     1.182
+++ sys/arch/i386/i386/trap.c   2003/07/25 02:54:20
@@ -495,9 +495,16 @@
                vm_prot_t ftype;
                extern struct vm_map *kernel_map;
                unsigned nss;
+               int old_l_flag;
 
+       
                cr2 = rcr2();
                KERNEL_PROC_LOCK(l);
+
+               /* Hack around SA architecture problem - disable blocking 
upcall*/
+               old_l_flag =  (l)->l_flag;       
+               (l)->l_flag &= ~L_SA;    
+
        faultcommon:
                vm = p->p_vmspace;
                if (vm == NULL)
@@ -560,6 +567,9 @@
                                KERNEL_UNLOCK();
                                return;
                        }
+
+
+                       (l)->l_flag |= (old_l_flag & L_SA);
                        KERNEL_PROC_UNLOCK(l);
                        goto out;
                }
@@ -587,8 +597,10 @@
                }
                if (type == T_PAGEFLT)
                        KERNEL_UNLOCK();
-               else
+               else {
+                       (l)->l_flag |= (old_l_flag & L_SA);
                        KERNEL_PROC_UNLOCK(l);
+               }
                break;
        }
 



> On Fri, Aug 15, 2003 at 08:01:53AM -0700, Stephen Ma wrote:
> > >>>>> "Martin" == Martin Weber <Ephaeton@gmx.net> writes:
> > 
> > Martin> Does this help anyone ? what else ? what would help instead ?
> 
> Sorry for the long post, just want to make sure you know how
> exactly I shot in my foot :)
> 
> > 
> > This could be the problem with page faults happening during the SA
> > upcall that Stephan Uphoff mentioned on tech-kern@netbsd.org
> > earlier. Unfortunately, the page fault problem needs some rather
> > delicate changes to the kernel SA code. If you want to know for sure,
> > try using the attached program (run it with an argument pointing to
> > the libpthreads.so.x.y that phoenix/firebird/whatever is linked
> > against). This'll lock the libpthreads into memory, and should avoid
> > the page-fault (unfortunately, it can only lock the code into memory,
> > so a data page could still be paged out, but that is less likely).
> > 
> > - S
> > 
> > #include <stdio.h>
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > #include <sys/stat.h>
> > 
> > int
> > main(int argc, char **argv)
> > {
> >         int fd;
> > 	struct stat st;
> > 	void *addr;
> > 	
> > 	if (argc != 2) {
> > 		fprintf(stderr, "Usage: %s <file>\n", argv[0]);
> > 		exit(1);
> > 	}
> > 
> > 	if ((fd = open(argv[1], O_RDONLY, 0)) == -1) {
> > 		fprintf(stderr, "Unable to open %s ", argv[1]);
> > 		perror("open()");
> > 		exit(1);
> > 	}
> > 
> > 	fstat(fd, &st);
> > 	addr = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
> > 	mlock(addr, st.st_size);
> > 
> > 	printf("Locked %d bytes of %s at 0x%08x\n", (int) st.st_size, argv[1], (int) addr);
> > 
> > 	for (;;) {
> > 		sleep(1);
> > 	}
> > 
> >         return 0;
> > }
> 
> Oookay. So I started xmms and phoenix, and some memory and cpu
> eaters, and observed that neither xmms nor phoenix crashed. Fine,
> I didn't have the chance to test it for toooo long, because after
> some minutes I thought fine, let's kill the cpu- and mem- eaters
> again and see what phoenix does. Well, it spun there eating all
> the available cpu (I'm pretty sure its own pages were swapped out
> somewhere in the middle). Then I made the great error :) me dummy
> started gdb and attached to phoenix. That was an error :> THE WHOLE
> SYSTEM HUNG ROCK SOLID. That is. Xmms was continuing on issuing a
> one second frame again and again (the same one), but the system
> itself hung. No network, console, reaction on anything I did (until
> I hit the power button).
> 
> May this be a lesson to me: Stop (trying to) debug(ging) things
> without directions (where are they?!) from the ppl who know the
> internals :)
> 
> -Martin
>