Subject: pthread stack problem
To: None <tech-userlevel@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-userlevel
Date: 08/05/2004 08:05:36
--sm4nu43k4a2Rpi4c
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

hi,

I'm looking for input on how to fix a problem in -current and 2.0
having to do with libpthread's use of the initial "main process" stack.

In pthread__initmain(), we compute the base and size that we'll use for
the initial pthread stack and pass those values to pthread__stackid_setup(),
where we use them to set t->pt_uc to an initial "safe" value.  The problem
is that the safe value isn't so safe for the initial process stack.
There are two issues:

 (1) on some platforms such as i386, the computed "base + size" includes
     the page-table-mapping, which isn't even part of the user address space.
     since the "safe" value is within this region, it's not a valid pointer.

 (2) even on platforms without the first problem, if the argv and environment
     strings and pointers use enough space, t->pt_uc will point into the
     middle of them, which will result in corruption of the environment.

I already made one change to try to address this, by using the current stack
pointer to bound the stack region that libpthread will use, but that has
a problem of its own.  Since pthread__initmain() is several frames down
the stack, the stack pointer can be past where it will be later when we
call main(), and so the stack pointer will appear to be out of bounds.
This causes a problem for the native JDK port (available in pkgsrc-wip),
since it has checks to make sure that the stack pointer is within bounds:

#0  0x484bbf0b in kill () from /usr/lib/libc.so.12
#1  0x48530933 in abort () from /usr/lib/libc.so.12
#2  0x4836981a in os::abort(int) ()
   from /home/chs/netbsd/pkgsrc/wip/jdk14/work/control/build/bsd-i586/lib/i386/client/libjvm.so
#3  0x48234b61 in report_error(int, char const*, int, char const*, char const*, ...) ()
   from /home/chs/netbsd/pkgsrc/wip/jdk14/work/control/build/bsd-i586/lib/i386/client/libjvm.so
#4  0x48234638 in report_fatal(char const*, int, char const*, ...) ()
   from /home/chs/netbsd/pkgsrc/wip/jdk14/work/control/build/bsd-i586/lib/i386/client/libjvm.so
#5  0x483b6655 in ThreadLocalStorage::get_thread_via_cache_slowly(unsigned, int) ()
   from /home/chs/netbsd/pkgsrc/wip/jdk14/work/control/build/bsd-i586/lib/i386/client/libjvm.so
#6  0x48278bb4 in JNI_CreateJavaVM ()
   from /home/chs/netbsd/pkgsrc/wip/jdk14/work/control/build/bsd-i586/lib/i386/client/libjvm.so
#7  0x0804a583 in InitializeJVM ()
#8  0x080492d6 in main ()


Thread* ThreadLocalStorage::get_thread_via_cache_slowly(uintptr_t raw_id,
                                                        int index) {
  Thread *thread = get_thread_slow();
  if (thread != NULL) {
    address sp = pd_sp_address();
    guarantee(thread->_stack_base == NULL ||
              (sp <= thread->_stack_base && 
                 sp >= thread->_stack_base - thread->_stack_size) || 
               is_error_reported(),
              "sp must be inside of selected thread stack");

    thread->_self_raw_id = raw_id;  // mark for quick retrieval
    _get_thread_cache[ index ] = thread;
  }
  return thread;
}


JDK only detected this problem because I rounded the stack pointer down
to a page boundary, which isn't necessary.  If I change the code to
not round down (as in the attached patch), then JDK is happy again.
However, the attached program shows demonstrates that the problem is
still detectable.  for me it prints:

29 spathi:~> cc -o checkstack checkstack.c -lpthread
30 spathi:~> ./checkstack 
addr 0xbe002000 size 0x1bfd82c end 0xbfbff82c sp 0xbfbff8a4
checkstack: sp NOT within stack bounds!
31 spathi:~> 


I've thought of a couple possible ways to fix this:

 (1) Change the caller of main() to record its current stack pointer in a
     global variable and let pthread__initmain() use that to initialize
     the stack bounds for the initial thread.

 (2) Change the kernel to make the usable part of the stack (ie. not including
     the argv and environment) aligned such that the (base + size) address
     that pthread__initmain() computes is valid and not in the middle of
     the argv or environment.

Does anyone have any other ideas on how to fix this?
Or would anyone like to especially advocate one of the above ideas?
I'm not very happy with either of them since they both involve changes
outside of libpthread itself.

-Chuck

--sm4nu43k4a2Rpi4c
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="diff.pthread.initmain2"

Index: pthread_stack.c
===================================================================
RCS file: /cvsroot/src/lib/libpthread/pthread_stack.c,v
retrieving revision 1.12.2.1
diff -u -p -r1.12.2.1 pthread_stack.c
--- pthread_stack.c	26 Jul 2004 07:09:38 -0000	1.12.2.1
+++ pthread_stack.c	28 Jul 2004 14:41:18 -0000
@@ -149,7 +149,7 @@ pthread__initmain(pthread_t *newt)
 	 * Just trim off the part of the stack that we can't use.
 	 */
 	base = (void *) (pthread__sp() & ~PT_STACKMASK);
-	size = ((char *)pthread__sp() - (char *)base) & ~(pagesize - 1);
+	size = (char *)pthread__sp() - (char *)base;
 
 	*newt = pthread__stackid_setup(base, size);
 }

--sm4nu43k4a2Rpi4c
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="checkstack.c"

#include <sys/types.h>
#include <inttypes.h>
#include <pthread.h>
#include <err.h>

int
main()
{
	pthread_attr_t attr;
	void *mystack, *stackend;
	size_t mysize;
	int error, bug;
	register void *sp __asm__("%esp");

	error = pthread_attr_init(&attr);
	if (error) {
		errx(1, "pthread_attr_init %d", error);
	}
	error = pthread_attr_get_np(pthread_self(), &attr);
	if (error) {
		errx(1, "pthread_attr_get_np %d", error);
	}
	error = pthread_attr_getstackaddr(&attr, &mystack);
	if (error) {
		errx(1, "pthread_attr_getstackaddr %d", error);
	}
	error = pthread_attr_getstacksize(&attr, &mysize);
	if (error) {
		errx(1, "pthread_attr_getstacksize %d", error);
	}

	pthread_attr_destroy(&attr);

	stackend = (char *)mystack + mysize;
	printf("addr %p size 0x%x end %p sp %p\n",
	       mystack, mysize, stackend, sp);
	if (sp < mystack || sp > stackend) {
		errx(1, "sp NOT within stack bounds!");
	} else {
		errx(0, "sp ok");
	}
}

--sm4nu43k4a2Rpi4c--