Subject: Re: LWP id into ktrace output - chapter 2.
To: Darren Reed <darrenr@reed.wattle.id.au>
From: Nathan J. Williams <nathanw@wasabisystems.com>
List: tech-kern
Date: 03/21/2003 17:55:41
This looks like pretty cool work. When I did the original
LWPification, I avoided a lot of this because after some analysis I
couldn't find a reason to propagate LWPs that low. It does appear that
ktrace is such a reason, though.

My biggest concern is about the changes in struct uio and struct
buf. The proc pointers that they currently hold are there because the
I/O in question is happening to that process's memory space; the
struct proc is used for its vmspace and a couple of other things, like
P_WEXIT, that are truly process-level and not lwp-level. Since LWPs
are relatively transient compared to processes, it seems that there is
a risk of LWPs being cached in these structures that do not exist when
the structure is used a bit later. I'm not 100% up to speed on the
myriad uses of the two structures, so I can't say with confidence that
it's safe or unsafe, but I think it needs closer examination.

The mechanical changes look decent; I have a few tweaks to suggest
(and the diff is much easier in unified diff mode than context diff
mode, since most of the changes are just single-line. Let's hear it
for emacs diff-mode for converting it for me).

There are a bunch of unnecessary whitespace and ordering changes in
variable declaration blocks throughout this diff, usually near the
introduction of a struct proc *p. They should probably be left alone;
mixing functional and style changes is bad form.

Specific file issues:


arch/powerpc/include/aout_machdep.h
 /* No special executable format (yet) */
-#define	cpu_exec_aout_makecmds(a, b)	ENOEXEC
+#define	cpu_exec_aout_makecmds(l, b)	ENOEXEC


This change seems unnecessary.


arch/powerpc/powerpc/process_machdep.c
@@ -196,7 +196,7 @@
 		uio.uio_resid = sizeof(struct vreg);
 		uio.uio_segflg = UIO_USERSPACE;
 		uio.uio_rw = write ? UIO_WRITE : UIO_READ;
-		uio.uio_procp = p;
+		uio.uio_lwp->l_proc = l;
 		return process_machdep_dovecregs(p, l, &uio);
 	}

s/uio_procp/uio_lwp->l_proc/ doesn't seem to have worked so well
here. "uio.uio_lwp = l", perhaps?


arch/x86/include/Makefile
@@ -10,6 +10,7 @@
 	intr.h intrdefs.h \
 	lock.h \
 	mtrr.h \
+	pic.h \
 	pio.h \
 	psl.h \
 	specialreg.h \

This seems like an unrelated change.


arch/x86_64/x86_64/syscall.c
@@ -317,10 +317,9 @@
 	KERNEL_PROC_UNLOCK(l);
 
 	userret(l);
-#ifdef KTRACE
 	if (KTRPOINT(p, KTR_SYSRET)) {
 		KERNEL_PROC_LOCK(l);
-		ktrsysret(p, SYS_fork, 0, 0);
+		ktrsysret(l, SYS_fork, 0, 0);
 		KERNEL_PROC_UNLOCK(l);
 	}
 #endif

Seems wrong.


In the compat code changes, there are a lot of diffs of the general form:
@@ -209,11 +209,10 @@
 		syscallarg(char *) path;
 		syscallarg(int) mode;
 	} */ *uap = v;
-	struct proc *p = l->l_proc;
 	struct sys_open_args cup;   
-	caddr_t sg = stackgap_init(p, 0);
+	caddr_t sg = stackgap_init(l->l_proc, 0);


Why make the first two changes? It seems less invasive to have the
"struct proc *p = l->l_proc;" line and leave the stakgap_init() line
alone; generally, that's why I added those "struct proc *p"
definitions. I know I wasn't 100% consistent, but when I did the
initial LWP work I tried to minimize the number of lines that I
touched by using that definition if struct proc *p instead of inline
conversions of p to l->l_proc.


All of compat/netbsd32/netbsd32_exec_elf32.c,
compat/svr4/svr4_exec_elf64.c, compat/osf1/osf1_exec_ecoff.c, and
compat/irix/irix_exec_elf32.c use LIST_FIRST(&p->p_lwps) when they
call emul_find_interp(). What state is the process in at that point?
Which process calls those routines?

Thh style around osf1_exec_ecoff_dynamic() permits initialization with
declarations; it should use the "struct proc *p = l->l_proc"
form. Similar style issues in pecoff_exec.c: exec_pecoff_makecmds()
and exec_pecoff_coff_makecmds().



ddb/db_xxx.c
@@ -153,7 +153,7 @@
 			if (p->p_stat == 0) {
 				continue;
 			}
-			l = LIST_FIRST(&p->p_lwps);
+			l = proc_representative_lwp(p);
 			db_printf("%c%-10d", " >"[cp == p], p->p_pid);
 
 			switch (*mode) {

By itself, this breaks ps/l in ddb, since that iterates over LWPs with
l. Adding a 'l = LIST_FIRST(...)' in the ps/l case should fix it.


dev/dksubr.c: In dk_lookup(), the conversion looks unfinished. p isn't
initialized (needs to be for p->p_ucred), and there are some vn_*()
calls still using p where l seems appropriate.


dev/pci/if_wi_pci.c: These changes seem entirely unrelated to the lwp thing.


rf_copyback.c: LIST_FIRST().
rf_disks.c: LIST_FIRST().
rf_reconstruct.c: LIST_FIRST().

I'm not terribly familiar with the raidframe code, but since it seems
that they're just pulling the LWP off of *_thread variables, perhaps
the RF_Thread_t type should be made an LWP pointer itself.

Similar issue in the USB code: it currently uses usb_proc_ptr for
cross-BSD portability; would it be feasable to make usb_proc_ptr into
struct lwp * instead of creating local differences?


fs/ntfs/ntfs_vfsops.c
@@ -469,16 +477,16 @@
 		return (EBUSY);
 #if defined(__FreeBSD__)
 	VN_LOCK(devvp, LK_EXCLUSIVE | LK_RETRY, p);
-	error = vinvalbuf(devvp, V_SAVE, p->p_ucred, p, 0, 0);
-	VOP__UNLOCK(devvp, 0, p);
+	error = vinvalbuf(devvp, V_SAVE, p->p_ucred, l, 0, 0);
+	VOP__UNLOCK(devvp, 0, l);
 #else
-	error = vinvalbuf(devvp, V_SAVE, p->p_ucred, p, 0, 0);
+	error = vinvalbuf(devvp, V_SAVE, l->l_proc->p_ucred, l, 0, 0);
 #endif
 	if (error)
 		return (error);

Probably don't want to change 'p' to 'l' in code inside #ifdef FreeBSD.


The vget() and VFS_VGET() routines have an interface change! This
needs to be documented in vnode(9).


kern/init_main.c:
@@ -165,7 +165,7 @@
 
 __volatile int start_init_exec;		/* semaphore for start_init() */
 
-static void check_console(struct proc *p);
+static void check_console(struct lwp *p);
 static void start_init(void *);
 void main(void);
 
Probably want to change the variable name as well as the type.


kern/kern_acct.c
@@ -337,10 +338,10 @@
 	/*
 	 * Now, just write the accounting information to the file.
 	 */
-	VOP_LEASE(acct_vp, p, p->p_ucred, LEASE_WRITE);
+	VOP_LEASE(acct_vp, proc_representative_lwp(p), p->p_ucred, LEASE_WRITE);

This use of proc_representative_lwp() is odd, as we've got a
convenient LWP variable.



net/if.c
@@ -1304,15 +1304,23 @@
 	return (NULL);
 }
 
+void bpx(struct proc *);
+void bpx(p)
+struct proc *p;
+{
+	static int pid;
+	pid = p->p_pid;
+}
+

What's this about?


netsmb/smb_subr.c:

-smb_proc_intr(struct proc *p)
+smb_proc_intr(struct lwp *l)
 {
+	struct proc *p;
+
+	if (l == NULL)
+		return 0;
+	p = l->l_proc;
 	if (p == NULL)
 		return 0;
 	if (!sigemptyset(&p->p_sigctx.ps_siglist)

If l isn't null, then p won't be null; the p==NULL test can be
removed.


nfs/nfs_socket.c, uvm/uvm_swap.c:
proc_representative_lwp(&proc0) is just lwp0.

nfs/nfs_syscalls.c:

@@ -486,6 +486,10 @@
 		nfs_numnfsd++;
 	}
 	PHOLD(l);
+
+	if (nfsd->nfsd_procp != p)
+		panic("nfsd_procp not being called for self");
+
 	/*
 	 * Loop getting rpc requests until SIGKILL.
 	 */

Adding this doesn't seem to make sense.... by my reading of the NFS
code, the nfsd structure is allocated per nfsd server process, and so
this will always be true. In fact, I don't see what the point of the
nfsd_procp value is at all, since it's only used in nfssvc_nfsd() and
there's a local variable there.


        - Nathan