Subject: Re: Some interesting papers on BSD ...
To: Michael Hancock <michaelh@cet.co.jp>
From: Terry Lambert <terry@lambert.org>
List: tech-kern
Date: 07/15/1996 14:01:00
> > > Solaris kernel also has a debugging feature, it's called "deadman".
> > > Every so often a timer based callout runs which runs down all the
> > > mutex/semaphore/spinlock holder lists in the kernel and panics if any
> > > circular and/or deadlock cases are found.
> > 
> > The IBM database design manual indicates several methods of fast
> > computation of transitive closure over acyclic directed  graphs
> > which make this kind of kludge unnecessary.  IMO, deadlock
> > avoidance beats deadlock detection any day of the week.  Consider
> > the issue of a two context deadly embrace for a resource: how do
> > you unwind state so you don't lose already processed context?  In
> > the deadlock detection case, the anser is "you probably don't".  Or
> > you go to some overly complicated cruft, like the system call
> > interrupt code in the trap, using a longjmp to unwind the stack.
> 
> How is this different from implementors carefully following a set of rules
> in terms of performance?  Are there just too many permutations that it
> just isn't reasonable to expect to be able to avoid deadlocks without
> computing transitive closure across directed graphs?

IMO, the implementors should carefully follow a set of rules, in all
cases.  This is simply a requirement if effective engineering.

The transitive closure computation should occur for several reasons,
not just debugging and having a built-in failsafe.

The primary reason for computing transitive closure across the
graph is to allow asynchronus entry of a lock cycle: you want this
to increase concurrency, apart from any other reasons.

A nice secondary reason is to implement thread anonymity in the
kernl multithreading and SMP cases: a processor, or a kernel thread
is a preemptible context which holds resources.  If you support
kernel preemption (either through another processor entering the
kernel or through allowing simultaneous threads of control to coexist
and switch at other than tsleep/quantum expiration), then you need
to support the concept of deadlock resoloution (detection or avoidance)
in the thread context in the kernel itself.

Microkernels tend to punt on this, stating that "servers" (threads
outside the scope of the microkernel itself) operate in a seperate
protection domain.  In the primary protection domain, the issue is
ignored, and you are back to your user/kernel seperation, with a
single kernel thread of control.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.