netbsd-bugs: kern/9384: "usl_detachtimeout" with muliple X servers started by xdm (wscons)

Subject: kern/9384: "usl_detachtimeout" with muliple X servers started by xdm (wscons)
To: None <gnats-bugs@gnats.netbsd.org>
From: Robert Elz <kre@munnari.OZ.AU>
List: netbsd-bugs
Date: 02/09/2000 21:39:55
>Number:         9384
>Category:       kern
>Synopsis:       kernel printf's "usl_detachtimeout" with multiple X severs under wscons
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Feb  9 21:39:00 2000
>Last-Modified:
>Originator:     Robert Elz
>Organization:
	The University of Melbourne
>Release:        NetBSD-1.4.1 (and I believe all since to current date)
>Environment:
	
System: NetBSD fuchsia.home.cs.mu.OZ.AU 1.4.1 NetBSD 1.4.1 (FUCHSIA) #2: Sat Dec 4 05:28:12 EST 1999 kre@fuchsia.home.cs.mu.OZ.AU:/a/src/sys/arch/i386/compile/FUCHSIA i386


>Description:

	If xdm is requested to start multiple X servers in a wscons
	system (i386 anyway, don't know if it is possible to do on other
	ports) the kernel will print "usl_detachtimeout" periodically
	and yank the console display back to ttyE0.  Ctl-Alt-Fn returns
	to the X display - but without the X server apparently having been
	aware that it had lost control of the display (no refresh is
	done, though runing "xrefresh" restores the display.

>How-To-Repeat:

	Along with appropriate wscons configuration, put this
		:0 local /usr/X11R6/bin/X vt08
		:1 local /usr/X11R6/bin/X :1 vt07
	in xdm/Xservers and reboot.

>Fix:

	To work around the problem, simply manually (Ctl-Alt-Fn) give each X
	server control of the display.   When xdm starts one of the two X
	servers gets the display handed to it automatically (the way that
	usually happens when an X server is started), however the other does
	not.  Once that server has had control of the display once, the
	problem is avoided until the next reboot.

	In more detail, Matthias Drochner says ...

As far as I see, this is due to brokeness in the USL virtual
console switching protocol, or at least in the way Xfree86
deals with it.
The X server does the following (from memory, sorry if I miss something):
1. open the first vt
2. find out the first free virtual screen
3. close it and open the vt device of the free one or what got passed
   on the command line
4. tell it to switch to it
5. wait for the switch to be done
6. initiate process synchronisation, ie reserve it for use by the X server
7. begin graphics initialisation

Now there is a race condition and a timing problem: if another process
switches the screen away between (5) and (7), the X server can do hardware
actions while not actually posessing the screen, and if another process
got in posession of the screen and does longish things, (5) can
time out.

Imho the X server must do (6) first - it can posess a screen while the
screen is not yet active -, and should do some error handling in
case of switch timeouts, eg retry.
As far as I can see, this would break with pcvt, which contains some
specific hacks to avoid the race condition which I refused to adopt
because they are broken in other ways.
The right thing would be to stop using a pcvt compatibility mode in
Xfree86 and do it better from the beginning.
The problem didn't bother me enough to spend time on it, and now we'll
get Xfree86 v4 soon, so it is kind of pointless to hack on the
old codebase.

A workaround could be to delay the startup of one of the X servers
for some seconds - use a script which does "sleep 5; exec X $*" instead
of one of the entries in Xservers. (just a guess, didn't try)

	I have not tried that suggested workaround yet, but it sounds
	entirely reasonable - manually starting multiple X servers doesn't
	have the problem that the (more or less) simultaneous start from
	xdm produces.
>Audit-Trail:
>Unformatted: