Subject: network lockup
To: MacBSD General <macbsd-general@NetBSD.ORG>
From: Dave Leonard <d@s160828.slip.cc.uq.edu.au>
List: macbsd-general
Date: 04/27/1995 09:39:45
SUMMARY: network stuff "locks up" after a while. processes block in wchan
         'soopts'

Hi, netbsd'ers

Can anyone help me with this? Perhaps its because i have a 1.0 kernel? I'll
tell my story and any info/ideas back would be appreciated.

After a reasonable time, all network stuff seems to 'lock up', and as a 
consequence the remote machine providing my PPP link hangs up due to 
inactivity. This has been happening periodically for the last few weeks.

So this morning, I come in and find things hung up, and the auto-redial hasn't
worked; the program that periodically pings to keep this connection alive,
and redials if necessary, ends up with its 'ping' in this state:

  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT TT       TIME COMMAND
    0 17801    96   1 -20  0   172   84 soopts D    ??    0:00.15 ping -c 1 -i 30 130.120.2.15
 
On the console is the hangup due to inactivity message:

	Apr 27 07:30:24 occult sendmail[18032]: HAA18032: from=<owner-macbsd-general@NetBSD.ORG>, size=1693, class=-30, pri=85693, nrcpts=1, msgid=<9504261835.AA06252@ferrari.libra.loral.com>, proto=ESMTP, relay=root@student.uq.edu.au [130.102.2.20]
	Apr 27 07:31:46 occult pppd[29572]: Hangup (SIGHUP)
	Apr 27 07:31:47 occult pppd[29572]: Exit.

Since pppd has cleaned up nicely, its restored the routing table to:

	Routing tables

	Internet:
	Destination      Gateway            Flags     Refs     Use  Interface
	127.0.0.1        127.0.0.1          UH          2      623  lo0

	XNS:
	Destination      Gateway            Flags     Refs     Use  Interface

As a test, I ping localhost's ip and end up with another ping hanging:

  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT TT       TIME COMMAND
    0 18134  1557   1 -20  0   172   84 soopts D    p0    0:00.21 ping 127.0.0.17

This only seems to happen after a fairly long time of being up. Ping gets
run every 30 seconds (should slow that down I guess), and i did a fair bit
of ftp'ing last night. The last message in was an smtp.

I spent a while reading through kernel sources, but all that achieved was
this wierd sensation in my leg, cos I was sitting wierdly.

On previous occasions when this network lockup has occurred, connections
already in place (eg telnets) are still usable. It only seems to be
things that make new connections ( they call setsockopts() I suppose ).

Any ideas anyone, or is this an old/known/fixed/trendy network bug?

d


Appendix
--------
The uptime as of this email was

Thu Apr 27 09:32:01 EST 1995
 9:32AM  up 1 day, 23:38, 4 users, load averages: 0.55, 0.19, 0.08

NetBSD occult.fnarg.net.au 1.0 NetBSD 1.0 (OCCULT) #4: Thu Dec 22 09:22:18 EST 1994     d@occult.fnarg.net.au:/usr/src/sys/arch/mac68k/compile/OCCULT mac68k

This has been 'netbsd.working'. It now gets booted by booter1.6 and doesn't
seem to mind at all.

# ls -i /netbsd /netbsd.working
16 1456 -rwxr-xr-x  2 root  wheel  734662 Jan  6 14:26 /netbsd
16 1456 -rwxr-xr-x  2 root  wheel  734662 Jan  6 14:26 /netbsd.working

IIsi FPU 17M ram standard video. No ether. Modem on tty00 at 19200 baud, 
dumb terminal on tty01 at 9600 baud. I login to the tty01 getty and
run GNU screen.

pppstats as of this email are:
# pppstats
    in   pack   comp uncomp    err |    out   pack   comp uncomp     ip
17419196  42417      0      0     89 | 2084337  36536      0      0  36536
     0      0      0      0      0 |      0      0      0      0      0
     0      0      0      0      0 |      0      0      0      0      0
     0      0      0      0      0 |      0      0      0      0      0

-- 
David Leonard                            BE(Comp)/BCompSc 5th year student
The University of Queensland             s160828@student.uq.edu.au