Subject: Re: After newlock2 merge: Different pthread behavior for userland programs?
To: mk@kilbi.de, Andrew Doran <ad@NetBSD.org>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: port-cobalt
Date: 03/12/2007 08:22:40
	I had similar behavior with one of our production Asterisk boxes
yesterday as well, and it took me a while to figure it out.  If it's the
same problem, which I'm not sure of, it's definitely not a threading issue.
I know this because my box is NetBSD-2.x with Asterisk-1.0.x, and I'm sure
there's been no threading changes in it recently.  What I discovered was
that another Asterisk box which was talking to it was unpatched in terms of
having its time zone files updated.  For some reason, and I don't know the
reason, it was sending packets which caused the primary Asterisk server to
stop receiving data on its primary sip socket.  You can check to see if
it's the same symptom by looking at the output of netstat -s.
If you run:
netstat -s |grep 'socket'
and see an incrementing number for the number of discarded packets because
socket was full, you're probably seeing the same issue.
	My fix was to patch the offending box, reboot it, and all is well
again.

Hope that helps.
-Brian
On Mar 11,  5:57pm, Markus W Kilbinger wrote:
} Subject: Re: After newlock2 merge: Different pthread behavior for userland
} >>>>> "Doran" == Andrew Doran <ad@netbsd.org> writes:
} 
}     >> After updating the whole machine to the newlock2 base (kernel +
}     >> userland) asterisk seems to startup fine, but does no longer
}     >> accept (all) incoming phone calls!?
} 
}     Doran> Hi, Thanks for the problem report.
} 
}     Doran> If you could file a PR about this it would be ideal. A good
}     Doran> first step for diagnostic the problem would be to attach to
}     Doran> the process responsible for handling incoming calls with
}     Doran> "ktrace -di -p $pid", and make available the ktrace.out
}     Doran> file that is produced (or relevant excerpts from kdump -R,
}     Doran> afterwards).
} 
} Sorry, I had no time so far to do some testings (too many NMI's :-/),
} but today I noticed something else/strange: My qube2 ran in the
} (known) situation that asterisk no longer accept calls (in these
} situations asterisk says: 'Mar 11 10:43:11 WARNING[587] app.c: No
} audio available on SIP/0800615243-00789000??').
} 
} So I tried to stop the running asterisk process und restarted it,
} which seemed to work (the process vanished from the process list and
} re-appeared after its re-starting). Now the big 'but': The machine was
} still not able to accept calls!? Repeating the asterisk's stop/restart
} procedure didn't help anyway. Only rebooting the whole machine made
} asterisk (initially) accepting calls again. How can this be!?
} 
} Which 'relicts' of a terminated program can survive its restart? Any
} other idea for this 'behavior'?
} 
} Markus.
>-- End of excerpt from Markus W Kilbinger