Subject: Re: kern/20337: current-current kernel drives sendmail crazy
To: Frank Kardel <kardel@acm.org>
From: Sean Davis <dive@endersgame.net>
List: netbsd-bugs
Date: 02/14/2003 06:28:59
On Fri, Feb 14, 2003 at 09:49:57AM +0100, Frank Kardel wrote:
> 
> >Number:         20337
> >Category:       kern
> >Synopsis:       current-20030213 kernel lets sendmail-current+bells loop
> >Confidential:   no
> >Severity:       serious
> >Priority:       high
> >Responsible:    kern-bug-people
> >State:          open
> >Class:          sw-bug
> >Submitter-Id:   net
> >Arrival-Date:   Fri Feb 14 00:51:01 PST 2003
> >Closed-Date:
> >Last-Modified:
> >Originator:     Frank Kardel
> >Release:        NetBSD 1.6N-20030213
> >Organization:
> 	
> >Environment:
> 	
> 	
> System: NetBSD pip 1.6N NetBSD 1.6N (PIP) #0: Tue Feb 11 13:36:20 MET 2003 kardel@pip:/src/NetBSD/netbsd/sys/arch/i386/compile/PIP i386
> Architecture: i386
> Machine: i386
> >Description:
> 	The current version of the kernel drives sendmail into a
> 	tight loop after delivering the mail locally but before 
> 	removing the mail from the queue. sendmail loops tighly
> 	on a read(2) that returns EAGAIN over and over again 8-).
> 	kernels from 2003-02-11 do not show that behavior.
> 	The sendmail is from pkgsrc/mail/sendmail with some bells
> 	and whistles added (for TLS/LDAP etc...) thus the sendmail
> 	i/o code might be more susceptible to changed/unexpected return
> 	codes from read(2) - older kernels do not trigger that behaviour.

I've noticed this behavior on another program - ssh.com's SSH secure shell
v3.2.2. Up until about the date you stated, ssh2 (being ssh.com's ssh2, not
openssh's ssh2) worked fine - but now, with a current kernel, it'll just decide
st some point to hang (not always at the same place, but the trace is always the
same: read() returning EAGAIN, over and over again. If I kill ssh2 and login
with openssh to the remote host, I discover that my input has in fact gotten
there, but the output from the remote host somehow never made it to me. So I've
had to use OpenSSH's ssh to login to this host (a Linux machine running
OpenSSH), because OpenSSH is strangely unaffected. perhaps it polls differently.
I haven't dug too deep into the code, but I think it needs further
investigation, because when sometihng works one day and stops working after a
current upgrade - even if you rebuild it - that smells of bug to me.

-- 
/~\ The ASCII
\ / Ribbon Campaign                   Sean Davis
 X  Against HTML                       aka dive
/ \ Email!