Subject: Re: new sysctl(KERN_PROC, ...) interface (was: sysinfo(2))
To: Bill Studenmund <email@example.com>
From: Simon Burge <firstname.lastname@example.org>
Date: 04/17/2000 12:47:22
Bill Studenmund wrote:
> On Sun, 16 Apr 2000, Simon Burge wrote:
> > Recently there was a little talk about limiting the rate of change of
> > the size of struct kinfo_proc, primarily motivated by ps(1) complaining
> > about 'proc size mismatches' whenever some kernel structures changed
> > size.
> Eww... I've seen the thread as of today, and I don't like where we're
> going with it. :-) The only time we run into these problems is in
> -current, which we say will break ps on occasion! :-)
Now that we do many ps's on startup and shutdown with rc.d, having a
working ps on -current, no matter how up-to-date userland is, is a good
thing. Also, in theory it should be possible to use a 1.5 ps on a 1.6
kernel (untested of course :-) and so on...
> > For each process requested, the handler would memcpy (uiomove or
> > whatever) only the first elem_size bytes or each struct kinfo_proc2 to
> > the user buffer. Thus we should be able to have an old ps(1) work on a
> > new kernel without complaining about proc size mismatches.
> Ok, where is the handler? If it's in the kernel, then I have objections to
> parts of this idea. :-)
It's really not that much different to current sysctl(KERN_PROC).
copyout(..., sizeof(struct proc));
for each element, it does a
> > Any basic flaws in this line of reasoning so far? Aidan - I know this
> > is not exactly what you had in mind; how much different is it?
> I guess my main objection is to the general tone of this idea and also
> things which were brought up later in the thread. The main thrust here
> is you're trying to make life easy for ps(1) when what ps(1) is trying
> to do is hard - It's trying to figure out the process list of a kernel
> later than for which it was compiled.
I guess I'm trying to make it not hard for ps to do that, since it's so
bloody annoying when it doesn't work :-) Seriously, I see no reason for
the kernel not to help userland let it know what's happening, if even
userland is too old...
> The thing I really don't like is the
> idea of doing something with sessions to get at the complete process
> list. That's trying to make a MIB-based interface work at something it's
> not good at.
This is _not_ something I was planning to handle, and isn't something
that my current implementation does. I'm with you on this one :-)
> There have been lots of discussions, and I really think the direct kernel
> grovveling approach is the best. Among other things, it puts the onus of
> trying to make a list of rapidly changing things on the userland tool
> asked to do it.
> Another problem with sysctl in general is that it is very compile-time
> dependent (like how mib entry text is turned into static numbers
> then..) i.e. it's a fairly stodgy interface with isn't adept at dealing
> with kernel/userland drift. :-)
I'm not sure I buy this - the MIB numbers should not change over time.
This would absolutely kill binary compatibilty because a number of libc
functions use sysctl to do their dirty work.
> Ok, so I really don't like shoving the process list through a MIB, and I
> think sysctl isn't good for where the kenrel has drifted relative to the
> userland. So can I do anything other than say I don't like it? :-)
> I hope so. How about this for an idea:
> How about we shove a description of the struct proc contents into the MIB?
> While things change in struct proc, I think things like the fact that
> there is a user id, a tty, and the big facts about a process don't. Maybe
> they get added to, but that's most of it. _Where_ they are does change,
> but the fact that they are there doesn't. So just use the MIB to tell ps
> where to find things in the memory it reads when kvm grovveling. ??
> So the idea would be that ps would:
> 1) read MIB entries for the things in struct proc it knew about and cared
> about when it was compiled, and figure out how big the current struct proc
> entries are.
> 2) kvm grovvel a bunch of them, and use the info from step 1) to map the
> things it grovveled into structures it understands. Then it does the
> normal ps stuff, and repeats.
If I understand what I think you mean, then a lot of extra info would be
needed in the kernel, like "the field named ``p_pid'' is N bytes from
the start of struct proc" so that it could be returned to userland.
We'd also need a way to reference the proc field names. At the moment,
there's 60 fields that are carried over to struct kinfo_proc2 and ps(1)
uses most of then (look at src/bin/ps/keyword.c)...
> I like this idea much better because it puts the onus of dealing with
> drift on ps and the other userland utilities. I.e. rather than teaching
> the kernel how to deal with older versions of the interface, we teach the
> programs to deal with newer. :-) Routines to do this could even be added
> to libkvm. :-)
As I said above, I'd like the kernel to help out. Certainly if things
were done as you suggested, libkvm whould have to deal with the mess
otherwise maintainence of all the userland users of the functionality
would be nightmarish.
> Plus, it freezes into the MIB the part which will most likely NOT change -
> what the fields are in struct proc (like p_flag, p_stat, p_pid, ...). Even
> with what you proposed above you had concerns about obsolessences. :-)
My initial thought is that what you're proposing is a lot work! I'm
merely revamping an existing interface, while you're proposing an
entirely new one. Given that I don't think I understand exactly what
you're suggesting (look at my paragraph on finding field offsets), I'm
not all for it at the moment ;)
Hmm, one thing occurs to me - getting a process's argv is very much
attached to the current vm machinery. What would be nice here (and
the opposite of what you probably are thinking!) is a nice little
kernel interface to return the address in either physmem or swap of a
given process' va. Then kvm_proc.c:kvm_uread() could be as simple as
sysctl(CTL_KERN, KERN_PROC_VA, pid, &type, &offset);
pread(type == swap ? swapfd : memfd, buf, nbpg, offset);