Subject: Re: what is the status of the ktrace-lwp branch?
To: Christos Zoulas <christos@astron.com>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 10/30/2005 09:42:20
On Sun, Oct 30, 2005 at 05:01:59AM +0000, Christos Zoulas wrote:
> In article <20051029210343.99717564FA@rebar.astron.com>,
> Christos Zoulas <christos@zoulas.com> wrote:
> >On Oct 29,  8:21pm, skrll@netbsd.org (Nick Hudson) wrote:
> >-- Subject: Re: what is the status of the ktrace-lwp branch?
> >
> >| On Saturday 29 October 2005 18:21, Christos Zoulas wrote:
> >| > It would be nice to merge it in at somepoint.
> >| 
> >| Two things needs to be resolved
> >| 
> >| 1) the format of struct ktr_header - chuq made some suggestions to make it MI
> >|    (iirc)
> >| 
> >| 2) the uio_proc -> uio_lwp change in struct uio. chuq wasn't keen. I traced
> >|    through the reason behind this and its difficult not to do it.
> >
> >Can we please have a summary here? I am almost done merging it to head
> >(~70 files with conflicts).
> 
> Ok, now that I've finished merging it, and have it all working including
> userland changes I understand a little better what's going on here:
> 
> 1. We could encode the version inside the type field to avoid the endianness
>    issue, but I don't see this as a big deal. I think I prefer what nick has
>    now.

this issue isn't endianness, it's 32-vs-64.  the ktrace file format uses
this structure:

struct ktr_header {
	int	ktr_len;		/* length of ktr_buf */
	short	ktr_type;		/* trace record type */
	pid_t	ktr_pid;		/* process id */
	char	ktr_comm[MAXCOMLEN+1];	/* command name */
	struct	timeval ktr_time;	/* timestamp */
	const void *ktr_buf;
};

and then can be a variable-length blob after that.  the timeval, buf pointer
(what the heck is a kernel pointer doing in a file format anyway?), and the
contents of the variable blob will all be different when written by 32-bit
vs. 64-bit kernels.  there's no good way to tell which format the file is,
so they're basically not portable between platforms.   (oh, the implicit
padding in this structure is bound to cause compatibility problems as well.)

if we're going to make the records incompatible at all (which nick was
proposing to do by changing the timeval to a timespec), then we should fix
these portability issues at the same time, to minimize the number of flag days.

if we make the assertion that old kdump binaries don't need to be able to
read new ktrace files (which seems reasonable, we've done the same with
both user and kernel coredumps before, and probably other things),
then we can redefine the file format any way we want and fix everything
at once, and build in support for better handling of future format changes
too if we want.


> 2. I understand where chuq is coming from and I think I agree with him. I
>    think that the uio_ is a process wide struct and not an lwp specific one.
>    For example, what happens when the lwp exits before the uio is completed?
>    On the other hand, keeping it as uio_proc, would require us to call
>    proc_representative_lwp(uio->uio_proc) in a bunch of places; not very
>    attractive either.

the uio_procp field was intended to identify the address space that the
pointers in uio_iov refer to.  as yamt points out, in our current system
the address space corresponds to struct vmspace since multiple processes can
share the same address space.

if we want to handle async operations, we will need to make sure that our
reference to the address space (via whatever structure) remains valid until
all async operations complete.  in some systems this is done by making
exiting processes wait for aio to complete, but this could be achieved
in other ways as well.

currently all the operations that use of uio_procp to refer to the kernel LWP
that issued the request are synchronous, to avoid the previous issue in
the absence of any other way to keep the proc pointer valid.  it doesn't
really make sense that any of these would ever become async, especially
because in our scheduler-activations world, the kernel LWP doesn't really
map to anything in particular that the user process is doing.  previously
we were looking to see what all non-address-space information was being
accessed via uio_procp so that we could consider passing this information
in a more explicit fashion, and the people doing that kinda got bogged down
in that process.  I'll dig through my old mail and see what I can find.

-Chuck