Subject: Re: field widths in ps(1).
To: None <tech-userlevel@netbsd.org>
From: Simon Burge <simonb@netbsd.org>
List: tech-userlevel
Date: 06/05/2000 21:56:55
Greg A. Woods wrote:

> Maybe do separate timings of "ps -ax", "ps -aux", and "ps -alx" too...

I did find a significant "problem" when I started working out exactly
what was using the time - I'm using hesiod for usernames, and even
though I was the only user on the box (ie it only had to look up one
name, root came from the local passwd file), the time for that was half
the total time to calculate all field widths.  I've been tinkering more
with "alx" to get timing figures now.

> > As to optimising the width calculations, currently it does for doubles
> > and ints something like:
> > 
> > 	if (mode == WIDTHMODE) {
> > 		asprintf(&buf, "%.*f", prec, val);
> 
> I was think it could do something even smarter that simply looked at the
> magnitude of the number knowing what base and format it would be
> converted into and didn't go about doing any actually conversions....

You chopped my next paragraph that said:

	>> when I could
	>> probably just use log10() to get the int and double widths with some
	>> smart math.

:)

> For example when converting to hex it's trivial to figure out how many
> digits you'll need.

Brain strain!?!  I ended up using

	fmtlen = 0;
	while (uval > 0) {
		uval >>= 4;
		fmtlen++;
	}

is there a better way?  I wonder if "log((double)uval)/log(16.0)+1" is
faster...

> For decimal numbers it's a bit harder, and I'm not
> sure if I'd want to tackle doing floating point in fewer CPU cycles than
> an actual conversion would take....

It works out faster (floating point) on an PPro - I dunno how it'd be
affected on say a FPUless hpcmips machine.

> Of course for some non-numeric fields (eg. the start time, percentages,
> etc.) it's probably sufficient to hard-code the width.

Hadn't though of percentages, I'll add them too...


Anyways, I've now got some code that uses logs to check lengths of ints
and doubles and the above code for hex numbers.  It also remembers
the previous largest value for each field so it doesn't unnecessarily
calculate lengths.  I've also added caching of the devname() lookups,
and the result runs faster than the standard ps except for when the
machine is very lightly loaded and there's only a couple of processes
attached to mostly different ttys.

With some instrumentation and 360 sleeps I'm seeing on a 166MHz PPro

	wincen:src/bin/psNEW 599> ./obj.i386/ps alx > /dev/null
	> init                   0.896ms
	> get procs              6.082ms
	> qsort                  4.566ms
	> get widths             5.270ms
	> print header           0.392ms
	> show results         125.101ms

and on a 500MHz Alpha PC164

	alpha:src/bin/psNEW 54> ./obj.alpha/ps alx > /dev/null
	> init                   0.977ms
	> get procs              7.814ms
	> qsort                  3.905ms
	> get widths             2.929ms
	> print header           0.977ms
	> show results          73.242ms

For David'd benefit, the new version is slightly faster with 300 sleeps
and 10 cpu burners running (38.61sec vs 39.99sec for 20 runs) and with
30 cpu burners running (3min 20.91sec vs 3min 22.11sec for 100 runs).

Unless anyone has reasons not to like this, I'll commit this in the next
few days.  At this stage I'll plan of leaving the 'q' option out as it
doesn't really make a difference, and there's always going to be a space
between columns so there won't be any run-on.

Simon.