Subject: Re: kern/32682: netbsd-3 ptyfs intermittent failure with Matlab
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Christos Zoulas <christos@zoulas.com>
List: netbsd-bugs
Date: 01/31/2006 17:55:02
The following reply was made to PR kern/32682; it has been noted by GNATS.

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/32682: netbsd-3 ptyfs intermittent failure with Matlab
Date: Tue, 31 Jan 2006 12:52:19 -0500

 On Jan 31,  5:25pm, hf@spg.tu-darmstadt.de (Hauke Fath) wrote:
 -- Subject: kern/32682: netbsd-3 ptyfs intermittent failure with Matlab
 
 | >Number:         32682
 | >Category:       kern
 | >Synopsis:       netbsd-3 ptyfs intermittent failure with Matlab
 | >Confidential:   no
 | >Severity:       serious
 | >Priority:       medium
 | >Responsible:    kern-bug-people
 | >State:          open
 | >Class:          sw-bug
 | >Submitter-Id:   net
 | >Arrival-Date:   Tue Jan 31 17:25:00 +0000 2006
 | >Originator:     Hauke Fath <hf@spg.tu-darmstadt.de>
 | >Release:        NetBSD 3.0_STABLE
 | >Organization:
 | -- 
 | /~\  The ASCII Ribbon Campaign                      Hauke Fath
 | \ /    No HTML/RTF in email	          Institut für Nachrichtentechnik
 |  X     No Word docs in email	                    TU Darmstadt
 | / \  Respect for open standards                Ruf +49-6151-16-3281
 | >Environment:
 | 	
 | 	
 | System: NetBSD Wintersberg 3.0_STABLE NetBSD 3.0_STABLE (SPG_PIII) #1: Mon Jan 23 18:52:48 CET 2006 hf@Heiligenberg:/var/obj/netbsd-builds/3_0/i386/sys/arch/i386/compile/SPG_PIII i386
 | Architecture: i386
 | Machine: i386
 | >Description:
 | 
 | 	With the pty subsystem that comes with NetBSD 3, Matlab
 | 	expects to find its ptys in /dev/pts. Every once in a while,
 | 	the required pty cannot be created, which results in Matlab 13
 | 	issuing dire warnings ("...no background processes/job
 | 	control/blah"),	and Matlab 14 simply aborting.
 | 
 | 	Sometimes the problem "goes away" after some tens of minutes,
 | 	at other times it needs a reboot to "fix". It is more likely
 | 	to appear with several users logged in on the machine.
 | 
 | 	The end of a Matlab 14 ktrace looks like
 | 
 |    [...]
 | 
 |    883 MATLAB   NAMI  "/dev/ptmx"
 |    883 MATLAB   RET   open 7
 |    883 MATLAB   CALL  ioctl(7,_IO('T',0x1,0),0xbfbf535c)
 |    883 MATLAB   RET   ioctl 0
 |    883 MATLAB   CALL  ioctl(7,_IOW('T',0x30,0x4),0xbfbf541c)
 |    883 MATLAB   GIO   fd 7 read 40 bytes
 |        "\^D\0\0\0\^D\0\0\0/dev/null\0\0\0\0\0\0\0/dev/pts/4\0\0\0\0\0\0"
 |    883 MATLAB   RET   ioctl 0
 |    883 MATLAB   CALL  stat64(0xbfbf54f0,0xbfbf5440)
 |    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
 |    883 MATLAB   NAMI  "/dev/pts/4"
 |    883 MATLAB   RET   stat64 0
 |    883 MATLAB   CALL  statfs(0xbfbf54f0,0xbfbf64f0)
 |    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
 |    883 MATLAB   NAMI  "/dev/pts/4"
 |    883 MATLAB   RET   statfs 0
 |    883 MATLAB   CALL  ioctl(7,_IOR('T',0x31,0x4),0xbfbf6528)
 |    883 MATLAB   RET   ioctl -1 errno -22 Invalid argument
 |    883 MATLAB   CALL  ioctl(7,_IO('T',0x1,0),0xbfbf63cc)
 |    883 MATLAB   RET   ioctl 0
 |    883 MATLAB   CALL  ioctl(7,_IOW('T',0x30,0x4),0xbfbf648c)
 |    883 MATLAB   GIO   fd 7 read 40 bytes
 |        "\^D\0\0\0\^D\0\0\0/dev/null\0\0\0\0\0\0\0/dev/pts/4\0\0\0\0\0\0"
 |    883 MATLAB   RET   ioctl 0
 |    883 MATLAB   CALL  stat64(0xbd3dd888,0xbfbf64b0)
 |    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
 |    883 MATLAB   NAMI  "/dev/pts/4"
 |    883 MATLAB   RET   stat64 0
 |    883 MATLAB   CALL  rt_sigaction(0x11,0xbfbf6200,0xbfbf6170,8)
 |    883 MATLAB   RET   rt_sigaction 0
 |    883 MATLAB   CALL  rt_sigprocmask(1,0xbfbf6380,0,8)
 |    883 MATLAB   RET   rt_sigprocmask 0
 |    883 MATLAB   CALL  open(0xbac662e0,0x8002,0)
 |    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
 |    883 MATLAB   NAMI  "/dev/pts/4"
 |    883 MATLAB   RET   open -1 errno -13 Permission denied
 |    883 MATLAB   CALL  rt_sigprocmask(1,0xbfbf02b0,0,8)
 |    883 MATLAB   RET   rt_sigprocmask 0
 |    883 MATLAB   CALL  kill(0x373, SIGABRT)
 |    883 MATLAB   RET   kill 0
 |    883 MATLAB   PSIG  SIGABRT SIG_DFL
 |    883 MATLAB   NAMI  "MATLAB.core"
 |  27966 MATLAB   RET   poll 0
 |  27966 MATLAB   CALL  getppid
 |  27966 MATLAB   RET   getppid 1
 |  27966 MATLAB   CALL  kill(0x6db7, SIGKILL)
 |  27966 MATLAB   RET   kill -1 errno -3 No such process
 |  27966 MATLAB   CALL  kill(0x1518, SIGKILL)
 |  27966 MATLAB   RET   kill 0
 |   5400 MATLAB   RET   nanosleep -1 errno -4 Interrupted system call
 |   5400 MATLAB   PSIG  SIGKILL SIG_DFL
 |  27966 MATLAB   PSIG  SIGRT1 caught handler=0xbd4c2eb0 mask=(1,2,3,4,6,8,10,11,12,13,14,15,16,18,19,20,21,22,23,24,25,26,27,28,30,31,32,33))
 |  27966 MATLAB   CALL  sigreturn(0x80a14b4)
 |  27966 MATLAB   RET   sigreturn -1 errno -2 No such file or directory
 |  27966 MATLAB   CALL  exit_group(0)
 | 
 | where
 | 
 | [hf@Wintersberg] /var/tmp > ll /dev/pts
 | total 0
 | 0 crw-rw-rw-  1 root    wheel  5, 0 Jan 29 22:56 0
 | 0 crw-rw-rw-  1 root    wheel  5, 1 Jan 31 03:15 1
 | 0 crw-rw-rw-  1 root    wheel  5, 2 Jan 31 00:17 2
 | 0 crw--w----  1 cbrown  tty    5, 3 Jan 20 16:48 3
 | 0 crw--w----  1 hf      tty    5, 5 Jan 31 18:04 5
 | [hf@Wintersberg] /var/tmp >
 | 
 | The Matlab core and ktrace.out are at
 | http://www.spg.tu-darmstadt.de/~hf/netbsd/matlab-ptyfs-pr.tar.bz2 
 | (4.3 MB).
 | 
 | >How-To-Repeat:
 | 
 | 	Start Matlab 13/14 on a NetBSD/i386 3 machine. Try a few
 | 	times, from different user accounts.
 
 Can you show what w(1) prints and the "interesting" ptys in /dev/[pt]ty??.
 I suspect what is going on, is that you have a rogue program that is
 opening old style pty's behind the pty subsystem's back, so when ptyfs
 tries to open the same pty, it fails. So when it fails for pts/4 for
 example, what does lsof say for /dev/{t,p}typ4?
 
 christos