Subject: kern/10113: procfs cmdline is (politely) not quite right
To: None <gnats-bugs@gnats.netbsd.org>
From: None <kre@munnari.OZ.AU>
List: netbsd-bugs
Date: 05/13/2000 14:25:12
>Number:         10113
>Category:       kern
>Synopsis:       cat /proc/curproc/cmdline returns EINVAL (on alpha) (and more)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 13 14:26:01 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Robert Elz
>Release:        NetBSD-current 2000-05-13
>Organization:
	University of Melbourne
>Environment:
	
System: NetBSD lavender.cs.mu.OZ.AU 1.4U NetBSD 1.4U (LAVENDER) #28: Sun May 14 06:24:07 EST 2000     kre@lavender.cs.mu.OZ.AU:/usr/src/sys/arch/alpha/compile/LAVENDER alpha


>Description:
	The cmdline implementation in procfs is bogus.  It's possible that
	part of the fix is a workaround of a UVM problem - that is, when
	(internally) accessing the top of the process VM (the end of the
	args) a request for I/0 of a PAGE_SIZE'd block starting at less
	than a PAGE_SIZE from the end of the mem space returns EINVAL
	rather than the data that is available.  Whether this is a bug
	in UVM or not depends upon how it is defined to work, and I was
	unable to determine that.   (Simon Burge found that problem, and
	provided the basis of the workaround/fix).

	Then, the cmdline function is unable to read more than one
	page of args, and a good thing too, as the way it is written
	attempting to get more than that would reference into lala land.

	And, on an attempt to read a lot of data when the above is
	fixed, most of the data won't be returned, only the final block
	of any read.

>How-To-Repeat:
	On an alpha, "cat /proc/curproc/cmdline" and note the EINVAL
	(the environment may need to be limited to rather smaller than
	a page of data to experience the problem.)

	Then, also try
		cd /usr/include
		cat /proc/curproc/cmdline * */* * */* * */* |
			tr \\0 \\012 |
			sed 10q

	and examine the output.   Create some other tool that will
	read the cmdline, but not all the rest of the args, and try
	the above using it, and examine the end of the output (using cat
	you get all those files, over and over again - generating a lot of
	output, so be wary of doing that...).

	Notice that nothing is quite the way it it should be.

>Fix:
	Apply the patch below.   This contains a workaround for the
	UVM problem (whether it be a problem in UVM, or a problem in the
	way that the cmdline code was using the UVM interfaces).
	That was developed by Simon Burge.

	It also allows the entire arg space to be read, or any part of
	it, starting at any offset, and will return as much data as is
	available in a single read (to the limit of the minimum of the
	amount requested, and the number of bytes of args available)

	Issues:  This code could be simplified, and made much more
		effecient if the ps_strings struct contained the byte
		count of the args (and the environ) as well as the string
		counts.   As it is now, the cmdline code has to read the
		entire arg space, every time it is called, just to make
		sure that accesses aren't attempted beyond the end of the
		arg string space - knowing the byte count would allow just
		the data requested to be accessed.

		It would be trivial to create /proc/N/environ based upon
		the cmdline code (could use same code with an arg/env flag).
		This is probably worth doing.

		While working on this, I noticed that while cmdline has
		special case code to deal with SZOMB processes, no
		zombie processes ever show up in /proc.  The code in cmdline
		is still needed, as the cmdline could have been opened
		before the process turned into a zombie, so that's OK.
		But, this would make implementing a (kind of) ps out of
		the contents of /proc impossible, as none of the zombie
		processes would be visible, and that's no good.

		There are round_page() and trunc_page() macros, but
		no page_offset() or anything quite like it I could see.
		That is, where page_trunc(n) is ((n) & ~PAGE_MASK)
		page_offset(n) would be ((n) & PAGE_MASK).  That expression
		is used more than 15 times in the MI kernel code, a macro
		for it (even though it is pretty simple) might be nice.

	Nothing is provided here to deal with any of those.

	This patch is from NetBSD-current as it is now.  I think it will
	almost apply to 1.4.1 and 1.4.2 and probably several other variants.
	The only change that's been made since 1.4.1 is substituting use of
	the P_ZOMBIE() macro for an explicit test of p_stat == SZOMB
	(which is not changed by this patch) - but does appear in the
	context, so it isn't going to apply cleanly.   I actually worked
	on this under 1.4U but nothing has changed since then.   The diff is
	against -current as I fetched it from the AU sup mirror about
	24 hours ago.

	Once this has been tested in -current, it would be nice if it was
	pulled up to 1.4.3

--- procfs_cmdline.c.WAS	Fri Jul 23 22:01:07 1999
+++ procfs_cmdline.c	Sun May 14 07:04:35 2000
@@ -85,7 +85,14 @@
 	 */
 	if (P_ZOMBIE(p) || (p->p_flag & P_SYSTEM) != 0) {
 		len = snprintf(arg, PAGE_SIZE, "(%s)", p->p_comm);
-		goto doio;
+		xlen = len - uio->uio_offset;
+		if (xlen <= 0) 
+			error = 0;
+		else
+			error = uiomove(arg, xlen, uio);
+
+		free(arg, M_TEMP);
+		return (error);
 	}
 
 	/*
@@ -142,14 +149,15 @@
 	 */
 	len = 0;
 	count = pss.ps_nargvstr;
-	upper_bound = round_page(uio->uio_offset + 1);
-	for (; count && len < upper_bound; len += PAGE_SIZE) {
+	upper_bound = round_page(uio->uio_offset + uio->uio_resid);
+	for (; count && len < upper_bound; len += xlen) {
 		aiov.iov_base = arg;
 		aiov.iov_len = PAGE_SIZE;
 		auio.uio_iov = &aiov;
 		auio.uio_iovcnt = 1;
 		auio.uio_offset = argv + len;
-		auio.uio_resid = PAGE_SIZE;
+		xlen = PAGE_SIZE - ((argv + len) & PAGE_MASK);
+		auio.uio_resid = xlen;
 		auio.uio_segflg = UIO_SYSSPACE;
 		auio.uio_rw = UIO_READ;
 		auio.uio_procp = NULL;
@@ -157,39 +165,30 @@
 		if (error)
 			goto bad;
 
-		for (i = len; i < (len + PAGE_SIZE) && count != 0; i++) {
+		for (i = 0; i < xlen && count != 0; i++) {
 			if (arg[i] == '\0')
 				count--;	/* one full string */
 		}
 
-		if (count == 0) {
-			/* No more argv strings, set up len and break. */
-			len = i;
-			break;
+		if (count == 0)
+			i--;		/* exclude the final NUL */
+
+		if (len + i > uio->uio_offset) {
+			/* Have data in this page, copy it out */
+			error = uiomove(arg + uio->uio_offset - len,
+			    i + len - uio->uio_offset, uio);
+			if (error || uio->uio_resid <= 0)
+				break;
 		}
 	}
-	if (len > 0)
-		len--;			/* exclude last NUL */
 
+ bad:;
 	/*
 	 * Release the process.
 	 */
 	PRELE(p);
 	uvmspace_free(p->p_vmspace);
 
- doio:
-	xlen = len - uio->uio_offset;
-	if (xlen <= 0) 
-		error = 0;
-	else
-		error = uiomove(arg + trunc_page(len), xlen, uio);
-
-	free(arg, M_TEMP);
-	return (error);
-
- bad:
-	PRELE(p);
-	uvmspace_free(p->p_vmspace);
 	free(arg, M_TEMP);
 	return (error);
 }
>Release-Note:
>Audit-Trail:
>Unformatted: