netbsd-bugs: misc/4543: spl(9) manpage in conflict with reality

Subject: misc/4543: spl(9) manpage in conflict with reality
To: None <gnats-bugs@gnats.netbsd.org>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: netbsd-bugs
Date: 11/19/1997 06:31:39
>Number:         4543
>Category:       misc
>Synopsis:       spl(9) page conflicts with actual implementation
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    misc-bug-people (Misc Bug People)
>State:          open
>Class:          doc-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 19 06:35:02 1997
>Last-Modified:
>Originator:     
>Organization:
	
>Release:        NetBSD 1.3_ALPHA as at 19971117, also NetBSD 1.2G
>Environment:
	
System: NetBSD Cup.DSG.Stanford.EDU 1.3_ALPHA NetBSD 1.3_ALPHA (PLACEBO) #2: Wed Oct 22 16:43:14 PDT 1997 jonathan@Whisk.DSG.Stanford.EDU:/usr/src/sys/arch/i386/compile/PLACEBO i386


>Description:

spl(9) describes the synchronization protocol used inside the kernel.
But the description is not correct. The man page begins

     These functions raise and lower the system priority level.  They are used
     by kernel code running at any given priority level to block higher-prior-
     ity interrupts, so that it can safely access variables or data structures
     which are used by kernel code that runs at a higher priority level.


where historically, and on at least hte mips and vax ports the
converse is actually true: sptty() blocks tty interrupts and _lower_
spl levels.  For example, spltty() does _not_ block out splsched().

This has the potential to cause severe confusion, both in
implementation and in understanding.  (at least it ddoes for me.)


After some recent discussion on on mailing lists, the spl(9) manpage
needs some discussion of whether it's permissible to call splfoo()
while at splbar(), where foo > bar: e.g., calling spltty() while at
splsched().

Given current implementation on some ports, such calls will fail to
guarantee synchronization -- for two reasons:

   (a)  intuitively, the spl() scheme has (at least historically)
        relied on  a hierarchical acquisition of spl levels,
        (something like a hierarchy of locks)  to achieve synchronized
	(apparently atomic) access to data structures, without any
	possibility of deadlock;

   (b)  At least some, non-BSD uses of the underlying hardware
	interrupt-priority or interrupt-enable  mechanisms require that
	you _strictly raise_ the IPL or _strictly reduce_ the enabled
	interrupts to give the right semantics. I understand this
	is still true with using VAX IPLs, for example.
	(otherwise, an interrupt-level function  at a higher level
	like splsched() could call, say, spltty(), and access tty
	data structures even while a non-interrupt thread running
	at spltty() has been interrupted in the middle of accessing
	those structures.)

I think whether or not such aspects are allowed is a relevant part of
the semantics of the API and really should be documented.  It would
certainly avoid repeated misunderstandings and miscommunication about
spl() usage and spl()-related changes to both MI and MD code.

OTOH, if the current documentation accurately reflects the intendd
semantics, there are at least two port masters who don't beleive the
current documentation is correct and havent implemented their ports
that way :).


>How-To-Repeat:

	see man 9 spl, followed by reading 4.3BSD vax code(?)
>Fix:

1. Change the text to say a given SPL and _lower_ levels are blocked, 

2. Document the necessary(???() restrictions (needed for suitable MI sematics 
   and to gaurantee MI correctness) on calling a _lower_-level
   SPL function from a higher (or equal) level function.

3. Perhaps check with developers that no bugs have been introduced
   by following the current documentation?
>Audit-Trail:
>Unformatted: