Subject: Locking trouble in net/if_tun.c
To: None <tech-net@netbsd.org>
From: Tom Ivar Helbekkmo <tih@eunetnorge.no>
List: tech-net
Date: 05/09/2004 21:56:46
I've been playing with openvpn, from pkgsrc, lately. It works great,
except that it's been making my base station VPN gateway hang hard
whenever I pressed the tunnel a bit. Every time I pushed more than a
little bit of data through the VPN tunnel, the VPN gateway would get
hung up at the start of tun_output, at the simple_lock() call:
int
tun_output(ifp, m0, dst, rt)
struct ifnet *ifp;
struct mbuf *m0;
struct sockaddr *dst;
struct rtentry *rt;
{
struct tun_softc *tp = ifp->if_softc;
#ifdef INET
int s;
int error;
#endif
int mlen;
ALTQ_DECL(struct altq_pktattr pktattr;)
simple_lock(&tp->tun_lock);
The machine would be hung so hard, it didn't respond to keypresses on
the console, and DDB would show that it was at this location because
of a packet coming in over the ethernet, destined for the client side
of the VPN tunnel. The DDB backtrace would always show either
ip_input->ip_forward->ip_output->tun_output
or
ip_input->tcp_input->tcp_output->ip_output->tun_output
Now, looking for a reson why the tunnel was locked, causing a deadlock
here, I first noticed that there's one single return from tun_output()
that doesn't unlock the tun_lock. It looks like this:
case AF_UNSPEC:
s = splnet();
IFQ_ENQUEUE(&ifp->if_snd, m0, &pktattr, error);
if (error) {
splx(s);
ifp->if_collisions++;
return (error);
}
I added a call to simple_unlock() before the return, but this did not
help. (I'm mentioning it anyway, because it looks to me like it's
wrong to leave the tun locked here.) Then, I figured that a "try
again" type error return might do the trick, so I changed the start of
tun_output thus:
RCS file: /cvsroot/src/sys/net/if_tun.c,v
retrieving revision 1.68
diff -c -r1.68 if_tun.c
*** if_tun.c 1 Mar 2004 13:54:02 -0000 1.68
--- if_tun.c 9 May 2004 19:42:03 -0000
***************
*** 438,446 ****
int mlen;
ALTQ_DECL(struct altq_pktattr pktattr;)
- simple_lock(&tp->tun_lock);
TUNDEBUG ("%s: tun_output\n", ifp->if_xname);
if ((tp->tun_flags & TUN_READY) != TUN_READY) {
TUNDEBUG ("%s: not ready 0%o\n", ifp->if_xname,
tp->tun_flags);
--- 438,450 ----
int mlen;
ALTQ_DECL(struct altq_pktattr pktattr;)
TUNDEBUG ("%s: tun_output\n", ifp->if_xname);
+ if (!simple_lock_try(&tp->tun_lock)) {
+ TUNDEBUG ("%s: already locked\n", ifp->if_xname);
+ return (ENOBUFS);
+ }
+
if ((tp->tun_flags & TUN_READY) != TUN_READY) {
TUNDEBUG ("%s: not ready 0%o\n", ifp->if_xname,
tp->tun_flags);
***************
*** 496,501 ****
--- 500,506 ----
if (error) {
splx(s);
ifp->if_collisions++;
+ simple_unlock(&tp->tun_lock);
return (error);
}
mlen = m0->m_pkthdr.len;
The ENOBUFS is supposed to indicate a transient error. Note that I've
kept the unlock in the collision case, because I still believe it's
supposed to be there -- but I may be wrong. Anyway, after this
change, my VPN tunnel works great, there are no hangs, and I do not
see any degradation in performance.
Anyone care to look this over, and try to understand whether I've done
the right thing, or just masked the symptoms of another problem? :-)
-tih
--
Tom Ivar Helbekkmo, Senior System Administrator, EUnet Norway
www.eunet.no T: +47-22092958 M: +47-93013940 F: +47-22092901