Subject: Re: kern/32682: netbsd-3 ptyfs intermittent failure with Matlab
To: None <gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,>
From: Christos Zoulas <christos@zoulas.com>
List: netbsd-bugs
Date: 01/31/2006 12:52:19
On Jan 31,  5:25pm, hf@spg.tu-darmstadt.de (Hauke Fath) wrote:
-- Subject: kern/32682: netbsd-3 ptyfs intermittent failure with Matlab

| >Number:         32682
| >Category:       kern
| >Synopsis:       netbsd-3 ptyfs intermittent failure with Matlab
| >Confidential:   no
| >Severity:       serious
| >Priority:       medium
| >Responsible:    kern-bug-people
| >State:          open
| >Class:          sw-bug
| >Submitter-Id:   net
| >Arrival-Date:   Tue Jan 31 17:25:00 +0000 2006
| >Originator:     Hauke Fath <hf@spg.tu-darmstadt.de>
| >Release:        NetBSD 3.0_STABLE
| >Organization:
| -- 
| /~\  The ASCII Ribbon Campaign                      Hauke Fath
| \ /    No HTML/RTF in email	          Institut für Nachrichtentechnik
|  X     No Word docs in email	                    TU Darmstadt
| / \  Respect for open standards                Ruf +49-6151-16-3281
| >Environment:
| 	
| 	
| System: NetBSD Wintersberg 3.0_STABLE NetBSD 3.0_STABLE (SPG_PIII) #1: Mon Jan 23 18:52:48 CET 2006 hf@Heiligenberg:/var/obj/netbsd-builds/3_0/i386/sys/arch/i386/compile/SPG_PIII i386
| Architecture: i386
| Machine: i386
| >Description:
| 
| 	With the pty subsystem that comes with NetBSD 3, Matlab
| 	expects to find its ptys in /dev/pts. Every once in a while,
| 	the required pty cannot be created, which results in Matlab 13
| 	issuing dire warnings ("...no background processes/job
| 	control/blah"),	and Matlab 14 simply aborting.
| 
| 	Sometimes the problem "goes away" after some tens of minutes,
| 	at other times it needs a reboot to "fix". It is more likely
| 	to appear with several users logged in on the machine.
| 
| 	The end of a Matlab 14 ktrace looks like
| 
|    [...]
| 
|    883 MATLAB   NAMI  "/dev/ptmx"
|    883 MATLAB   RET   open 7
|    883 MATLAB   CALL  ioctl(7,_IO('T',0x1,0),0xbfbf535c)
|    883 MATLAB   RET   ioctl 0
|    883 MATLAB   CALL  ioctl(7,_IOW('T',0x30,0x4),0xbfbf541c)
|    883 MATLAB   GIO   fd 7 read 40 bytes
|        "\^D\0\0\0\^D\0\0\0/dev/null\0\0\0\0\0\0\0/dev/pts/4\0\0\0\0\0\0"
|    883 MATLAB   RET   ioctl 0
|    883 MATLAB   CALL  stat64(0xbfbf54f0,0xbfbf5440)
|    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
|    883 MATLAB   NAMI  "/dev/pts/4"
|    883 MATLAB   RET   stat64 0
|    883 MATLAB   CALL  statfs(0xbfbf54f0,0xbfbf64f0)
|    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
|    883 MATLAB   NAMI  "/dev/pts/4"
|    883 MATLAB   RET   statfs 0
|    883 MATLAB   CALL  ioctl(7,_IOR('T',0x31,0x4),0xbfbf6528)
|    883 MATLAB   RET   ioctl -1 errno -22 Invalid argument
|    883 MATLAB   CALL  ioctl(7,_IO('T',0x1,0),0xbfbf63cc)
|    883 MATLAB   RET   ioctl 0
|    883 MATLAB   CALL  ioctl(7,_IOW('T',0x30,0x4),0xbfbf648c)
|    883 MATLAB   GIO   fd 7 read 40 bytes
|        "\^D\0\0\0\^D\0\0\0/dev/null\0\0\0\0\0\0\0/dev/pts/4\0\0\0\0\0\0"
|    883 MATLAB   RET   ioctl 0
|    883 MATLAB   CALL  stat64(0xbd3dd888,0xbfbf64b0)
|    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
|    883 MATLAB   NAMI  "/dev/pts/4"
|    883 MATLAB   RET   stat64 0
|    883 MATLAB   CALL  rt_sigaction(0x11,0xbfbf6200,0xbfbf6170,8)
|    883 MATLAB   RET   rt_sigaction 0
|    883 MATLAB   CALL  rt_sigprocmask(1,0xbfbf6380,0,8)
|    883 MATLAB   RET   rt_sigprocmask 0
|    883 MATLAB   CALL  open(0xbac662e0,0x8002,0)
|    883 MATLAB   NAMI  "/emul/linux/dev/pts/4"
|    883 MATLAB   NAMI  "/dev/pts/4"
|    883 MATLAB   RET   open -1 errno -13 Permission denied
|    883 MATLAB   CALL  rt_sigprocmask(1,0xbfbf02b0,0,8)
|    883 MATLAB   RET   rt_sigprocmask 0
|    883 MATLAB   CALL  kill(0x373, SIGABRT)
|    883 MATLAB   RET   kill 0
|    883 MATLAB   PSIG  SIGABRT SIG_DFL
|    883 MATLAB   NAMI  "MATLAB.core"
|  27966 MATLAB   RET   poll 0
|  27966 MATLAB   CALL  getppid
|  27966 MATLAB   RET   getppid 1
|  27966 MATLAB   CALL  kill(0x6db7, SIGKILL)
|  27966 MATLAB   RET   kill -1 errno -3 No such process
|  27966 MATLAB   CALL  kill(0x1518, SIGKILL)
|  27966 MATLAB   RET   kill 0
|   5400 MATLAB   RET   nanosleep -1 errno -4 Interrupted system call
|   5400 MATLAB   PSIG  SIGKILL SIG_DFL
|  27966 MATLAB   PSIG  SIGRT1 caught handler=0xbd4c2eb0 mask=(1,2,3,4,6,8,10,11,12,13,14,15,16,18,19,20,21,22,23,24,25,26,27,28,30,31,32,33))
|  27966 MATLAB   CALL  sigreturn(0x80a14b4)
|  27966 MATLAB   RET   sigreturn -1 errno -2 No such file or directory
|  27966 MATLAB   CALL  exit_group(0)
| 
| where
| 
| [hf@Wintersberg] /var/tmp > ll /dev/pts
| total 0
| 0 crw-rw-rw-  1 root    wheel  5, 0 Jan 29 22:56 0
| 0 crw-rw-rw-  1 root    wheel  5, 1 Jan 31 03:15 1
| 0 crw-rw-rw-  1 root    wheel  5, 2 Jan 31 00:17 2
| 0 crw--w----  1 cbrown  tty    5, 3 Jan 20 16:48 3
| 0 crw--w----  1 hf      tty    5, 5 Jan 31 18:04 5
| [hf@Wintersberg] /var/tmp >
| 
| The Matlab core and ktrace.out are at
| http://www.spg.tu-darmstadt.de/~hf/netbsd/matlab-ptyfs-pr.tar.bz2 
| (4.3 MB).
| 
| >How-To-Repeat:
| 
| 	Start Matlab 13/14 on a NetBSD/i386 3 machine. Try a few
| 	times, from different user accounts.

Can you show what w(1) prints and the "interesting" ptys in /dev/[pt]ty??.
I suspect what is going on, is that you have a rogue program that is
opening old style pty's behind the pty subsystem's back, so when ptyfs
tries to open the same pty, it fails. So when it fails for pts/4 for
example, what does lsof say for /dev/{t,p}typ4?

christos