Subject: Re: But why?
From: Travis Hassloch x231 <travis@EvTech.com>
Date: 10/21/1996 19:40:31
Before you consider this a flame, please read it all. I hope my tone
has stayed somewhat playful even when criticizing. Sorry otherwise.
>> firstname.lastname@example.org (Jeff Bacon) writes:
>> 3) Every BSD and SVR4 based system today, except for Linux, has a very
>> broken System call mechanism.
I disagree with "broken", except if you generalize it to "inefficient".
Your fanaticism is showing :)
>> You'd think that when people put together function call conventions
>> for a particular processor, the OS people would take a look at this
>> and find a way to take advantage of this. In fact, believe it or
>> not, they have not to this very day.
Actually, OS people have done this A LOT. Just look at L3.
Microkernel dudes have put out plethoras of papers on reducing system
call overhead, particularly since their system calls require two
Critical code paths are not unrepresented in the literature.
>> whether you are doing it in the traditional broken UNIX way or the
>> clean, fast, and superior Linux way. First I will show the Linux
Again, this sort of emotional arguing isn't likely to win Linux any converts.
The terms "clean" and "superior" aren't supported by your arguments.
It is definitely faster. However, the thing you haven't answered here
is "what do you lose by doing it this way?". The answer is "portability".
You now have to write the backend of every system call in assembler
(but see below!).
>> basically the same, but step 2 is disgustingly inefficient for
But as far as I remember, NetBSD doesn't do any complicated unpacking,
but simply writes the args onto a contiguous memory area, but I could
be wrong -- perhaps there is an extra copy in there.
>> 4) Solaris cannot even do it's own optimizations correctly because
>> SunPRO is a broken compiler.
Again, I'd say it compiles code just fine. "Broken" simply isn't
supported by your argument. It doesn't have a feature gcc has.
>> and avoid the address computation all the time? Yes, very
>> brilliant idea.
Well, clever, anyway...
>> gook which has to be written in raw assembly) code can directly
>> take advantage of this. However, the C code cannot do this
>> because SunPRO lacks a way for you to tell the compiler that
>> "hey you don't need to load things, it's already in these
>> hard coded registers"
Gee, I'm shooting in the dark here, but is it maybe because it's not part
of the C standard?
Although I realize you are enthusiastic (and should be, especially if
you are trying to "convert" people over), beware of marginalizing your
"competitors" or their products. You're speaking to a wide range of
audiences, and calling something broken when it isn't is not a good idea.
We can all find a feature Linux (or one of it's flavors, anyway) doesn't
have, can't we?
>> That is gross, why even do the optimization in the first place?
[of course, maybe it _is_ faster than not doing it at all..
have you measured it?]
>> Now GCC has a way to fully take advantage of such an optimization,
>> basically all I have to do is put the following in a header file.
>> register struct task_struct *current asm("g6");
Very, VERY cool! When was this added to gcc?
Of course, you realize that this necessarily machine-dependent.
It is also limited to globals or autos, from what it appears.
And if you use it on automatic variables, I would guess, you couldn't
put it in a header file. And it has to be a register gcc won't trounce.
I was mulling over how one could get the effect of passing-by-register
in C, and do so while eliminating or isolating any MD portions into a
small portion of the code. This combines the
1) portability advantage of stack-based passing like BSD,
2) speed of register passing like Linux.
We have three pieces of data:
1) the MD locore stuff which is written in assembler
2) the MI system call stuff written in C
3) the MD mapping so that (2) can get the data written into registers by (1).
So here's a strawman idea, I'm sure everyone will help me burn it :)
Write all system calls with no parameters, and use C globals (arg1..argN)
pass in the data, with their declarations and asm() stuff above isolated
into a MD header file.
The globals would obviously have to be void or void *, since different
system calls have different type args, or you'd have to have a different
global name for different types (e.g. argi1 vs argvp1). But it's not much
of an issue since you don't have language-enforced typesafety when calling
a system call anyway. Since it's 'static area parameter passing' -
you pay a slight penalty in readability, and it is harder to call recursively.
(Hey, I have to point out the drawbacks, no matter how small right?) :)
GCC would have to be smart enough to not use the registers corresponding to
arg1 ... argN when compiling a C function which included this header
without saving the values away first. This doesn't seem too hard.
The rest of the design is oriented towards minimizing this penalty.
If you can make gcc smarter here, you can make the rest of the design
Based on this limitation, you can minimize the performance impacts by
partitioning the arg1 ... argN declarations into N header files.
Then, make sure each C function only includes the header files for
the number of arguments it needs. And make sure any internal functions
don't include any.
How about some kind of linkage map to break/loosen C argument passing
standards. On static functions, this optimization could be done entirely
in gcc, and may already be done - at the penalty of screwing up debuggers.
(Personally, I'd love to see it inline functions away when told :)
Involving non-static functions would obviously require modifications to the
linker. Since we're concerned primarily with non-static functions, let's
assume we have to modify ld. The (3) above could be a link-map, a special
piece of data which you give the linker to tell it what kind of argument
passing to use. Or, (3) could be data embedded in the .o file, which
was created by the C compiler (and probably would require changes to
the object file format).
In either case, the C compiler would leave the procedure entry and exit
"unfinished", and the linker would have to have some smarts;
it would be writing very small bits of "glue code" to pull parameters
from the stack for standard C/Pascal linkage, and to do similar fixups at
I'm not familiar with gcc, so I'm not confident at all that this is
practical. I'm guessing potential pitfalls are:
1) the linker has to be too smart
2) the compiler has to leave the procedure entry and exit points in
such a generic state that certain optimizations cannot be performed
3) you lose the distinction between compiler and linker -- you start
wondering, "gee, can I do some interprocedural dataflow analysis in
here" and pretty soon you are requiring every part of the code
to be resident in order to do those optimizations
>> Tada, now GCC will fully understand what I have done for it.
>> Under SparcLinux this optimization alone took away 115 instructions
>> in the scheduler sources, and it took ~50 instructions out of the
>> exit() handling, and it took ~65 instructions out of the fork()
Nice... I'm including this one in here for the NetBSD folks to consider...
>> I hope that explains some of it, and gives people at least some sort
>> of idea of the kinds of things that makes Linux scream on just about
>> any hardware.
Remember folks, maximum velocity is very important, but it ain't the whole
story. As one of my friends vitriolized; "sure, if you're one of those
classless morons who only cares about top speed". Rude, true, and
>> I'd be more than happy to chat with you via email about it or
>> similar. I love talking about performance issues on various
>> processors and systems.
I am eagerly awaiting constructive responses. I don't care about OS
religious wars, but I'd love to do something to improve the situation for
one or for all.
I also do not intend to follow up on the newsgroup since the topic has
drifted considerably. In your replies, you might suggest a relevant forum!
>> David S. Miller