Subject: Solaris MP
To: None <netbsd-advocacy@netbsd.org>
From: Miles Nordin <carton@Ivy.NET>
List: netbsd-advocacy
Date: 12/12/1999 00:22:15
I've ranted on this list before about how the truly relevant goals in
MT/SMP work are poorly-understood (or more reasonably, simply incompletely
implemented) by FreeBSD and Linux. As part of a truly atrocious operating
systems class, I was told to read the following interesting paper:
http://csel.cs.colorado.edu/~vasa/courses/csci3753/1061.pdf
It is a White Paper, meaning that it is a pseudo-technical paper with the
stated purpose of trying to sell Sun products, rather than to educate or
advance human knowledge like a true research paper. However, this also
means a non-technical person can fairly easily get something out of it.
The paper doesn't get me as close to understanding the link between MT/SMP
and real-time scheduling as I'd like to be, given that I keep bringing it
up without knowing what I'm talking about, but it's a start. I definitely
reccommend reading it if you are interested in the evolution of SMP stuff,
or if you often get into arguments about it.
Here are my observations about the paper. You should probably read the
paper before you read them, or maybe even read just the paper alone.
Seriously. don't be lazy. read the paper. then come back.
Encouraging implications:
o Sun used NetBSD's originally-proposed strategy of getting MT to work
first, and then working on SMP. They found the debugging work that
they could do on uniprocessor MT useful. Almost all of their
work seems highly relevant to uniprocessor systems. This defends past
decisions, suggests that we are on the right track, and makes an
encouraging statement about the future of NetBSD MT/SMP.
o The implementation of Sun's locking primitives is supposedly designed
to promote clean code. At least the idea that some implementations
are better at promoting future clean code than others puts this stuff
right up NetBSD's alley, as far as NetBSD's ability to make a useful
contribution to the field.
o ``Having threads'' and ``using threads'' are not as closely
intertwined as one might pessimistically assume. MT is still a
gigantic work, but it's not necessarily something that can't be
committed until it's finished. The paper mentions a distinction
between MT-safe and MT-hot, for example, and hints at instances
where merely being ``MT-safe'' is trivial. It may well be practical
to debug and commit an MT framework with only a few MT-hot subsystems,
perhaps subsystems which are MT-hot only for the purpose of debugging
the framework, yet still be architecturally well ahead of global-lock
implementations. There are a lot of decisions to make in the
implementation that have nothing to do with boasting about how many
locks you have--indeed, the entire paper is written about such
decisions.
o Threads are not necessarily a burden. Application programmers use
them not because they want to annoy us, but because it makes their
work easier, their code cleaner, and their goals easier to reach. As
the paper states outright, kernel code is likely to enjoy these same
benefits. Adding MT to the kernel shouldn't be misconceived as
something that will make every bit of kernel code from then on harder
to write--it may well do just the opposite.
o Kernel threads decrease the relevancy certain burdensome optimizations
that one needs to worry about in an old-school kernel. For example,
an interrupt handler doesn't need to return as quickly if it can
simply throw itself onto the regular scheduler when it starts running
too long. Thus, threads could improve the performance of the kernel,
accelerate the pace of kernel development, promote the sanity and
happiness of kernel developers--or all three.
o Having threads in the NetBSD kernel will permit us to write kernel
code and device drivers that others can't use at all, or can't use
to the fullest benefit.
Discouraging implications:
o Having threads in the NetBSD kernel will encourage us to write kernel
code and device drivers that others can't use at all, or can't use
to the fullest benefit.
o Sun threw out the BSD codebase before they started working on this.
We can't do that. However, if and when we finish we're likely to have
superior code to Sun's, since free Unixes typically already beat
Solaris on Performance and system call overhead (and filesystem
advances, VM robustness, software RAID usefulness, networking
implementation, source code usability, having a well-maintained
Coda port, pleasant non-comittee-moron documentation, and the
fact that they come with working C compilers, and stuff.)
o Their implemenatation required writing debugging and profiling
tools, and doing some fairly ambitious testing and optimization. This
offers some perspective on just how unreasonable an undertaking this
is for a single person working unfunded (in his free time).
o The implementation of synchronization primitives as function calls
rather than language primitives probably limits egcs's ability to make
useful optimizations. For example, volatile variables could be cached
in a register until the end of a critical section. Perhaps we could
work around this by attaching some egcs-hint to the function through a
header file #define, but more likely this problem will remain out of
our hands for a long time, if not forever.
The usual problem with my emails applies: although I may use pretentious
language, my actual understanding of the subject is pretty poor and
simplistic. Since the people who do understand would probably prefer to
write code than rant for several pages on mailing lists, the chances of my
stating a convincing lie and never getting corrected are high. (ex., it's
happened before--I said NTFS isn't transaction-based, and it does have
some kind of transaction log).
That said, I'd obviously welcome potentially interesting comments.
--
Miles Nordin / v:1-888-857-2723 fax:+1 530 579-8680
555 Bryant Street PMB 182 / Palo Alto, CA 94301-1700 / US