Subject: Re: ktrace-lwp branch
To: Nick Hudson <nick.hudson@dsl.pipex.com>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/13/2005 06:30:13
On Wed, Mar 09, 2005 at 08:23:16AM +0000, Nick Hudson wrote:
> > I think that uio_procp was originally used more to refer to the address
> > space than the thread, so it would be good to take a look at which (if any)
> > paths actually need the lwp pointer and see if they can get it some other
> > way. I took a quick look at this and almost all of the uses are just to get
> > at the proc.
>
> So, I traced a few of the uio_lwp users and found the following
>
> - coda_{read,write} => coda_rdwr => VOP_{OPEN,CLOSE}
> - coda_readdir => VOP_{OPEN,CLOSE}
>
> I'm still digging.
ok, but do VOP_OPEN and VOP_CLOSE actually use the lwp argument?
the NFS versions pass it along to other functions, but tracing the
call chain down several more levels doesn't lead to anything that
actually cares. or maybe that's what you're still digging for. :-)
> FWIW, FreeBSD has done s/struct proc *uio_procp/struct thread *uio_td/
sure. another example is that solaris appears to have gotten rid of
uio_procp entirely. (well, I don't know if SVR4 ever had it.)
hpux doesn't have uio_procp or a replacement either.
> > as for the layout changes, we appear to be reinterpreting some of the
> > fields and bits in struct ktr_header.
> >
> > old:
> >
> > struct ktr_header {
> > int ktr_len; /* length of ktr_buf */
> > short ktr_type; /* trace record type */
> > pid_t ktr_pid; /* process id */
> > char ktr_comm[MAXCOMLEN+1]; /* command name */
> > struct timeval ktr_time; /* timestamp */
> > const void *ktr_buf;
> > };
> >
> >
> > new:
> >
> > struct ktr_header {
> > int ktr_len; /* length of record minus length of
> > old header */ #if BYTE_ORDER == LITTLE_ENDIAN
> > short ktr_type; /* trace record type */
> > short ktr_version; /* trace record version */
> > #else
> > short ktr_version; /* trace record version */
> > short ktr_type; /* trace record type */
> > #endif
> > pid_t ktr_pid; /* process id */
> > char ktr_comm[MAXCOMLEN+1]; /* command name */
> > union {
> > struct timeval _tv; /* v0 timestamp */
> > struct timespec _ts; /* v1 timespec */
> > } _ktr_time;
> > union {
> > const void *_buf; /* v0 unused */
> > lwpid_t _lid; /* v1 lwp id */
> > } _ktr_id;
> > };
> >
> >
> > so we're re-using the implicit padding between ktr_type and ktr_pid to
> > store the record version. that looks fine. but since we're changing the
> > layout it's be nice to take the opportunity to make it common across
> > platforms. the use of "struct timeval" (whose members are of type "long")
> > makes the files different on different platforms. old kdump binaries won't
> > be able to read the new files correctly anyway, so there's no need to make
> > the records the same size. I listed some specifics under "suggestions"
> > below.
>
> The changes above mean that old kdump binaries can decode, as normal ,for the
> default mode (i.e. without timestamps) the new ktrace header records with only
> one exception. This being the new KTR_SAUPCALL record for which the old kdump
> will print hex output.
>
> [...]
> > so to summarize my suggestions:
> >
> > - reconsider whether uio_procp really needs to change.
>
> I'm still looking at this.
>
> > - investigate eliminating proc/lwp args from some APIs.
>
> Shouldn't this be done on HEAD?
I guess my point was that it doesn't make much sense to convert proc to lwp
if we can just get rid of the argument altogether. (this suggestion would
have been more useful before all the work to convert them was done, though.)
> > - consider using all fixed-size types for ktr_header (ie. uint*_t).
> > so that trace files are more portable.
> > - get rid of the pointer from the on-disk format
>
> This is gone really and replaced with lwp id. Unless you mean remove it from
> the _ktr_id union in struct ktr_header?
for struct ktr_header to become fixed-size for all kernels, we'd have to
remove the pointer from the embedded union. (putting a kernel pointer into
an on-disk structure was just a bad idea in the first place, but that's
water under the bridge at this point.)
> > - timeval/timespec have "long", so don't use them directly.
> > choose 32- or 64-bit and use them explicitly.
> > - if there's also some way to detect the endianness, then the v1 layout
> > should be completely MI after these changes.
>
> I'm not sure I want to break old binaries in this way.
if the goal is to not break old binaries, then we should not change the
timestamps from microseconds to nanoseconds. if we're going to break the
old binaries with that change, then we might as well break them more to
achieve greater benefits.
-Chuck