Subject: Re: PostgreSQL
To: Curt Sampson <cjs@cynic.net>
From: Garrett D'Amore <garrett_damore@tadpole.com>
List: tech-perform
Date: 02/03/2006 23:23:24
Curt Sampson wrote:
> On Fri, 3 Feb 2006, Garrett D'Amore wrote:
>
>> The problems with MT programs really have a lot more to do with shared
>> state, I think. Folks who don't understand locking considerations (how
>> to avoid deadlock, priority inversion, etc.) struggle with it.
>
> It's also the lack of good tools, I think. At this point, our
> systems for decomposing and recomposing single-threaded code (i.e.,
> modularization). For for concurrent code, we don't have such things;
> every time you take two separate concurrent operations and put them
> together, you have to deal with all of the concurrency considerations
> again.
>
> This paper takes a stab at helping with this:
>
> http://research.microsoft.com/~simonpj/papers/stm/index.htm
In interesting idea -- transactional memory. I didn't read the whole
paper, but just conceptually I'm not convinced that it really eliminates
the problem. It provides developers with a different approach that may
be easier, but it also requires programmers to properly handle
idempotency. This is great for pure computation, but for stuff that
interfaces with e.g. devices, networks, etc., idempotency can be a real
gotcha.

Better tools (debugging, etc.) will help. But most of the problems like
in fundamental design. Its very hard to start with a single-threaded app
and retrofit threads on to it. Those are the worse designs.

To do threading right, you really need to design it for threading from
the ground up.

You really need to understand making your interfaces narrow, and using
data hiding to control access.

Interesting to me is that folks who deal with Java seem to do much
better at getting this right than folks who deal with C++ (or at least
the Java programs seem to suffer less from concurrency related bugs).
This is, I think, because Java encourages good design, and proper object
oriented design. You have locks protecting the data in the object,
rather than some critical section protecting a hodgepodge of globals.

Again, its not to say it can't be done right in C or C++, just that Java
lends itself to good design. Other languages would probably do so as well.

By comparison, C++ is amongst the worst offenders for encouraging good
design, object-oriented or threaded, and its already complex nature
makes it far more likely to be abused than even C.
>> 5) ultimately, threading for them does very little for DB performance --
>> most of the benefit is in connection setup, and the belief is that this
>> should not be considered in the hot-path of a DB application --
>> connection caching and pooling is seen as a better way to optimize this
>
> I'd say that that's definitely the biggie. The one relatively common
> case where processes rather than threads are a problem is when you're
> holding open a lot of connections--thousands or tens of thousands. But
> to fix that, you don't need to thread the entire app; you can just
> have a front end that maintains connections and their state, and when
> a connection needs service, it hands off the actual work to one of the
> back ends.
>
>> All that said, there is some thought that limited use of MT could
>> improve e.g. parallel sorts, etc.
>
> I wonder about that. Sorts, in particular, once they get large will
> almost inevitably be spooled out to disk, which may well be the biggest
> performance issue with them.
Computer memories are growing per Moore's law (or approaching it). Data
sets are growing -- and they are growing faster than CPU speeds.

Conversely, CPU performance seems to be lagging. Especially given new
trends to concurrency (e.g. UltraSPARC-T1, where each core performs at a
considerably slower speed than even last years CPUs).

My bet is that we're going to see strong demand for better concurrency
not just for parallelizing I/O, but also compute tasks -- MT is likely
to be a bigger winner here than multiple processes, because the
inter-process synchronization for such things can become a major player
in performance.

Sometime in the next couple of years the postgres people are going to
have to solve this, or they will lose relevance, or someone will fork
postgres and add this feature.

-- Garrett

-- 
Garrett D'Amore, Principal Software Engineer
Tadpole Computer / Computing Technologies Division,
General Dynamics C4 Systems
http://www.tadpolecomputer.com/
Phone: 951 325-2134  Fax: 951 325-2191