NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/43128: Threads support in ptrace() is insufficient for gdb to debug threaded live apps
>Number: 43128
>Category: kern
>Synopsis: Threads support in ptrace() is insufficient for gdb to debug
>threaded live apps
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Apr 05 15:30:00 +0000 2010
>Originator: Paul Koning
>Release: 5.0
>Organization:
Dell
>Environment:
NetBSD pkvm50 5.0 NetBSD 5.0 (GENERIC) #34: Mon Apr 5 10:26:00 EDT 2010
root@pkvm50:/usr/obj/sys/arch/i386/compile/GENERIC i386
>Description:
I have a working implementation of thread debug support in GDB 7.0.
In the process of doing this I found that the ptrace syscall is missing a
couple of features needed for debugging of threaded applications. In
particular, PT_STEP and PT_CONTINUE do not have a way to control which threads
to operate on.
As discussed on the tech-kern list, what is needed is the ability to step or
continue a single thread (leaving any others stopped) and also the ability to
step a specified thread (while allowing other threads to continue).
I have implemented changes to sys_ptrace to provide these missing features.
With these changes, GDB can now successfully debug threaded programs (which
will cure PR bin/30756). I'm ready to submit the GDB changes to the FSF but I
cannot do this unless the ptrace changes required for this are accepted by
NetBSD. I'm submitting my changes as part of this PR, and I propose these as
the solution to the issue. If a different solution is preferred, that's fine
so long as it provides the same capabilities (then I can adjust GDB to match
what is approved).
Part of the patch I include here is corrections to the ptrace manpage. Partly
this documents what I added, and partly it corrects inaccuracies in the
PT_LWPINFO description.
>How-To-Repeat:
Attempt to debug a live app with GDB.
>Fix:
--- sys/sys/lwp.h.orig 2009-02-05 20:54:09.000000000 -0500
+++ sys/sys/lwp.h 2010-04-05 10:14:35.000000000 -0400
@@ -284,6 +284,7 @@
int lwp_trylock(lwp_t *);
void lwp_addref(lwp_t *);
void lwp_delref(lwp_t *);
+void lwp_delref2(lwp_t *);
void lwp_drainrefs(lwp_t *);
bool lwp_alive(lwp_t *);
lwp_t *lwp_find_first(proc_t *);
@@ -293,6 +294,7 @@
void lwpinit(void);
int lwp_wait1(lwp_t *, lwpid_t, lwpid_t *, int);
void lwp_continue(lwp_t *);
+void lwp_unstop(lwp_t *);
void cpu_setfunc(lwp_t *, void (*)(void *), void *);
void startlwp(void *);
void upcallret(lwp_t *);
--- sys/kern/kern_sig.c.orig 2010-03-18 12:25:46.000000000 -0400
+++ sys/kern/kern_sig.c 2010-03-18 13:44:12.000000000 -0400
@@ -1674,9 +1674,10 @@
/*
* If we are no longer being traced, or the parent didn't
- * give us a signal, look for more signals.
+ * give us a signal, or we're stopping, look for more signals.
*/
- if ((p->p_slflag & PSL_TRACED) == 0 || p->p_xstat == 0)
+ if ((p->p_slflag & PSL_TRACED) == 0 || p->p_xstat == 0 ||
+ (p->p_sflag & PS_STOPPING) != 0)
return 0;
/*
--- sys/kern/kern_lwp.c.orig 2010-03-22 19:25:30.000000000 -0400
+++ sys/kern/kern_lwp.c 2010-04-05 10:14:41.000000000 -0400
@@ -354,6 +354,44 @@
}
/*
+ * Restart a stopped LWP.
+ *
+ * Must be called with p_lock held, and the LWP NOT locked. Will unlock the
+ * LWP before return.
+ */
+void
+lwp_unstop(struct lwp *l)
+{
+ struct proc *p = l->l_proc;
+
+ KASSERT(mutex_owned(proc_lock));
+ KASSERT(mutex_owned(p->p_lock));
+
+ lwp_lock(l);
+
+ /* If not stopped, then just bail out. */
+ if (l->l_stat != LSSTOP) {
+ lwp_unlock(l);
+ return;
+ }
+
+ p->p_stat = SACTIVE;
+ p->p_sflag &= ~PS_STOPPING;
+
+ if (!p->p_waited)
+ p->p_pptr->p_nstopchild--;
+
+ if (l->l_wchan == NULL) {
+ /* setrunnable() will release the lock. */
+ setrunnable(l);
+ } else {
+ l->l_stat = LSSLEEP;
+ p->p_nrlwps++;
+ lwp_unlock(l);
+ }
+}
+
+/*
* Wait for an LWP within the current process to exit. If 'lid' is
* non-zero, we are waiting for a specific LWP.
*
@@ -1368,11 +1406,25 @@
struct proc *p = l->l_proc;
mutex_enter(p->p_lock);
+ lwp_delref2(l);
+ mutex_exit(p->p_lock);
+}
+
+/*
+ * Remove one reference to an LWP. If this is the last reference,
+ * then we must finalize the LWP's death. The proc mutex is held
+ * on entry.
+ */
+void
+lwp_delref2(struct lwp *l)
+{
+ struct proc *p = l->l_proc;
+
+ KASSERT(mutex_owned(p->p_lock));
KASSERT(l->l_stat != LSZOMB);
KASSERT(l->l_refcnt > 0);
if (--l->l_refcnt == 0)
cv_broadcast(&p->p_lwpcv);
- mutex_exit(p->p_lock);
}
/*
--- sys/kern/sys_process.c.orig 2010-03-18 12:25:53.000000000 -0400
+++ sys/kern/sys_process.c 2010-04-05 10:15:24.000000000 -0400
@@ -155,7 +155,7 @@
syscallarg(int) data;
} */
struct proc *p = l->l_proc;
- struct lwp *lt;
+ struct lwp *lt, *lt2;
struct proc *t; /* target process */
struct uio uio;
struct iovec iov;
@@ -163,7 +163,8 @@
struct ptrace_lwpinfo pl;
struct vmspace *vm;
int error, write, tmp, req, pheld;
- int signo;
+ int signo = 0;
+ int resume_all;
ksiginfo_t ksi;
#ifdef COREDUMP
char *path;
@@ -393,6 +394,7 @@
write = 0;
*retval = 0;
tmp = 0;
+ resume_all = 1;
switch (req) {
case PT_TRACE_ME:
@@ -528,6 +530,44 @@
p->p_trace_enabled = trace_is_enabled(p);
/*
+ * Pick up the LWPID, if supplied. There are two cases:
+ * data < 0 : step or continue single thread, lwp = -data
+ * data > 0 in PT_STEP : step this thread, continue others
+ * For operations other than PT_STEP, data > 0 means
+ * data is the signo to deliver to the process.
+ */
+ tmp = SCARG(uap, data);
+ if (tmp >= 0) {
+#ifdef PT_STEP
+ if (req == PT_STEP)
+ signo = 0;
+ else
+#endif
+ {
+ signo = tmp;
+ tmp = 0; /* don't search for LWP */
+ }
+ }
+ else
+ tmp = -tmp;
+
+ if (tmp > 0) {
+ if (req == PT_DETACH) {
+ error = EINVAL;
+ break;
+ }
+ lwp_delref2 (lt);
+ lt = lwp_find(t, tmp);
+ if (lt == NULL) {
+ error = ESRCH;
+ break;
+ }
+ lwp_addref(lt);
+ resume_all = 0;
+ signo = 0;
+ }
+
+ /*
* From the 4.4BSD PRM:
* "The data argument is taken as a signal number and the
* child's execution continues at location addr as if it
@@ -540,7 +580,7 @@
*/
/* Check that the data is a valid signal number or zero. */
- if (SCARG(uap, data) < 0 || SCARG(uap, data) >= NSIG) {
+ if (signo < 0 || signo >= NSIG) {
error = EINVAL;
break;
}
@@ -557,7 +597,17 @@
#ifdef PT_STEP
/*
* Arrange for a single-step, if that's requested and possible.
+ * More precisely, set the single step status as requested for
+ * the requested thread, and clear it for other threads.
*/
+ LIST_FOREACH(lt2, &t->p_lwps, l_sibling) {
+ if (lt != lt2)
+ {
+ lwp_lock(lt2);
+ process_sstep(lt2, 0);
+ lwp_unlock(lt2);
+ }
+ }
error = process_sstep(lt, req == PT_STEP);
if (error) {
uvm_lwp_rele(lt);
@@ -579,8 +629,6 @@
/* not being traced any more */
t->p_opptr = NULL;
}
-
- signo = SCARG(uap, data);
sendsig:
/* Finally, deliver the requested signal (or none). */
if (t->p_stat == SSTOP) {
@@ -590,7 +638,10 @@
* an LWP runs to see it.
*/
t->p_xstat = signo;
- proc_unstop(t);
+ if (resume_all)
+ proc_unstop(t);
+ else
+ lwp_unstop(lt);
} else if (signo != 0) {
KSI_INIT_EMPTY(&ksi);
ksi.ksi_signo = signo;
--- lib/libc/sys/ptrace.2.orig 2010-04-05 10:23:27.000000000 -0400
+++ lib/libc/sys/ptrace.2 2010-04-05 10:33:17.000000000 -0400
@@ -149,7 +149,9 @@
to indicate that execution is to pick up where it left off.
.Fa data
provides a signal number to be delivered to the traced process as it
-resumes execution, or 0 if no signal is to be sent.
+resumes execution, or 0 if no signal is to be sent. If a negative
+value is supplied, that is the negative of the LWP ID of the thread to
+be resumed, and only that thread executes.
.It Dv PT_KILL
The traced process terminates, as if
.Dv PT_CONTINUE
@@ -256,8 +258,8 @@
call currently does not stop the child process so it can generate
inconsistent data.
.It Dv PT_LWPINFO
-Returns information about the specific thread from the process specified
-in the
+Returns information about a thread from the list of threads for the
+process specified in the
.Fa pid
argument.
The
@@ -274,8 +276,15 @@
.Pp
where
.Fa pl_lwpid
-contains the thread for which to get info.
+contains a thread LWP ID. Information is returned for the thread
+following the one with the specified ID in the process thread list,
+or for the first thread if
+.Fa pl_lwpid
+is 0.
Upon return
+.Fa pl_lwpid
+contains the LWP ID of the thread that was found, or 0 if there is
+no thread after the one whose LWP ID was supplied in the call.
.Fa pl_event
contains the event that stopped the thread.
Possible
@@ -303,6 +312,14 @@
Execution continues as in request PT_CONTINUE; however
as soon as possible after execution of at least one
instruction, execution stops again.
+If the
+.Fa data
+argument is greater than 0, it contains the LWP ID of the thread to be
+stepped, and any other threads are continued. If the
+.Fa data
+argument is less than zero, it contains the negative of the LWP ID of
+the
+thread to be stepped, and only that thread executes.
.It Dv PT_GETREGS
This request reads the traced process' machine registers into the
.Dq Li "struct reg"
@@ -310,6 +327,10 @@
.Aq Pa machine/reg.h )
pointed to by
.Fa addr .
+The
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be read. If zero is supplied, the first thread of the process is read.
.It Dv PT_SETREGS
This request is the converse of
.Dv PT_GETREGS ;
@@ -319,6 +340,10 @@
.Aq Pa machine/reg.h )
pointed to by
.Fa addr .
+The
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be written. If zero is supplied, the first thread of the process is written.
.It Dv PT_GETFPREGS
This request reads the traced process' floating-point registers into
the
@@ -327,6 +352,11 @@
.Aq Pa machine/reg.h )
pointed to by
.Fa addr .
+The
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be read. If zero is supplied, the first thread of the process is
+read.
.It Dv PT_SETFPREGS
This request is the converse of
.Dv PT_GETFPREGS ;
@@ -336,6 +366,11 @@
.Aq Pa machine/reg.h )
pointed to by
.Fa addr .
+The
+.Fa data
+argument contains the LWP ID of the thread whose registers are to
+be written. If zero is supplied, the first thread of the process is
+written.
.\" .It Dv PT_SYSCALL
.\" This request is like
.\" .Dv PT_CONTINUE
Home |
Main Index |
Thread Index |
Old Index