tech-cluster: parallel computing, SMP, and threading

Subject: parallel computing, SMP, and threading
To: Tim & Alethea Larson <thelarsons3@cox.net>
From: Erik E. Fair <fair@netbsd.org>
List: tech-cluster
Date: 04/30/2004 13:00:07
At 12:43 -0500 4/30/04, Tim & Alethea Larson wrote:
	Yes, I thought good threading was something of a prerequisite 
to SMP. What does threading do for us on a non-MP system?  Can the 
kernel scheduler get things done more efficiently with threaded apps?

	-----

To be perfectly clear, you don't need any kind of thread support for 
an MP or SMP system to be useful. The utility is in having more than 
one processor to pick processes off the run queue to run in parallel. 
Since UNIX loves to spawn processes, this wins for throughput right 
away even if any particular application doesn't run any faster than 
it did on a uniprocessor system with the same speed CPU. Imagine what 
an SMP does for E-mail processing on an SMTP server when the MTA 
spawns a new process for each SMTP client that has contacted it. Each 
SMTP connection is independent, and can be run in parallel. Add 
processors, speed things up (until you run into some other limit, 
like disk or RAM bandwidth).

Thread support isn't even required to speed up your application, if 
your application can spawn additional processes to divide the work. 
Take compiling a program with "make -j N" for N CPUs, for example. 
Make knows which parts of the building process can be done in 
parallel (i.e. that do not depend on each other), and which parts 
must be serialized (do one before starting the other, e.g. running 
lex(1) or yacc(1) to generate a ".c" file before running cc(1) to 
compile). Make essentially performs data flow analysis on program 
compilation:

	http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=data+flow+analysis

However, you'll note that make doesn't require shared memory for its 
work - merely a shared filesystem. So, if you tell make(1) how many 
processors you have, it will spawn as many parallel compiles, etc., 
as it can within the min() of the number of CPUs (as specified by 
"-j") or possible parallel compiles (as specified by the structure of 
the Makefile).

If your application has a lot of shared data that needs to be 
accessed quickly (i.e. faster than disk access), then threading makes 
sense - one process, many "threads" running in that process with a 
shared address space. Just be careful to watch out for data integrity 
by using semaphores to lock shared data structures before modifying 
them. Also, depending on the application, you may find that the cache 
coherency and semaphore overhead eats away at some of the potential 
performance gain, if your application shares memory "too much".

Many of these issues are discussed in detail in the book, "In Search 
of Clusters" (2nd Ed.) by Gregory F. Pfister. The NetBSD Project has 
a mailing list for discussing clustering for NetBSD systems: 
tech-cluster@netbsd.org

It's also important to remember Amdahl's Law:

	http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=Amdahl%27s+Law

I hope this clarifies things somewhat.

	Erik <fair@netbsd.org>