Subject: kern/8142: rnd driver uses splsoftclock() incorrectly
To: None <gnats-bugs@gnats.netbsd.org>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: netbsd-bugs
Date: 08/04/1999 05:38:45
>Number:         8142
>Category:       kern
>Synopsis:       rnd driver uses splsoftclock() incorrectly
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug  4 05:05:00 1999
>Last-Modified:
>Originator:     Bill Sommerfeld
>Organization:
	
>Release:        19990804
>Environment:
	
System: NetBSD orchard.arlington.ma.us 1.4 NetBSD 1.4 (ORCHARDII) #54: Sun May 16 10:05:51 EDT 1999 sommerfeld@orchard.arlington.ma.us:/usr/src/sys/arch/i386/compile/ORCHARDII i386


>Description:
	splsoftclock() is a "special" spl call which *lowers* rather
	than *raises* interrupt protection, intended for use by the
	hardclock() mechanism.

	rnd.c appears to use it "as if" it were an spl-raising call.
	
	also, ptcpoll() in tty_pty.c seems to make the same potential
	mistake (though that may be less problematic).
	
>How-To-Repeat:
	see backtrace reported to current-users by Manuel Bouyer with
	multiple nested calls to ipintr:

...
#8  0xf0157b57 in ip_output ()
#9  0xf015c970 in syn_cache_respond ()
#10 0xf015bd7a in syn_cache_get ()
#11 0xf0159dc7 in tcp_input ()
#12 0xf01542d1 in ipintr ()
#13 0xf0101cac in Xsoftnet ()
#14 0xf015e4b3 in tcp_new_iss ()
#15 0xf015c56c in syn_cache_add ()
#16 0xf0159e3e in tcp_input ()
#17 0xf01542d1 in ipintr ()
#18 0xf0101cac in Xsoftnet ()
#19 0xf015e4b3 in tcp_new_iss ()
#20 0xf015c56c in syn_cache_add ()
#21 0xf0159e3e in tcp_input ()
#22 0xf01542d1 in ipintr ()
...

	most likely, what's happening is that there's a large number
	of SYN packets coming in simultaneously from the point of view
	of the softnet mechanism; each recursive invocation of
	ipintr() handles one packet.

	I'm willing to believe that if the number of simultaneous SYNs
	is small, we survive.

	If the number of simultaneous SYNs times the stack depth taken
	by one invocation exceeds the stack size, we blow the stack,
	fall over, and die.

>Fix:
	workaround: turn off pseudo-device rnd and rebuild kernel.
	real fix: "uhh, don't do that"
		Find a *real* spl level at which to run the random
		pool.

		

	

>Audit-Trail:
>Unformatted: