Subject: SUP lossage, take N...
To: None <current-users@NetBSD.ORG>
From: Jason Thorpe <thorpej@NetBSD.ORG>
List: current-users
Date: 03/30/1996 11:23:56
Ok ... I *really* think I have it, this time...
The SUP server got into this sort of state:
{root}netbsd# netstat -f inet | grep sup
tcp 39 0 netbsd.supfiles qabalah.cac.psu..1199 CLOSE_WAIT
tcp 39 0 netbsd.supfiles scrap.cs.colorad.1038 CLOSE_WAIT
tcp 93 0 netbsd.supfiles infpc01.informat.1075 CLOSE_WAIT
tcp 93 0 netbsd.supfiles tanelorn.tky.hut.1026 CLOSE_WAIT
tcp 39 0 netbsd.supfiles scrap.cs.colorad.1037 CLOSE_WAIT
tcp 39 0 netbsd.supfiles tanelorn.tky.hut.1025 CLOSE_WAIT
tcp 39 0 netbsd.supfiles scrap.cs.colorad.1036 CLOSE_WAIT
tcp 39 0 netbsd.supfiles scrap.cs.colorad.1035 CLOSE_WAIT
I killed and restarted the server, and I was able to sup allsrc in its
entirety...
And it looks like there is some success now:
{root}netbsd# netstat -f inet | grep sup
tcp 0 0 netbsd.supfiles talc.rendition.c.1386 SYN_RCVD
tcp 0 0 netbsd.supfiles gate.ene.unb.br.1331 ESTABLISHED
tcp 0 16200 netbsd.supfiles pingu.math.hokud.1549 ESTABLISHED
tcp 0 0 netbsd.supfiles pgoyette.bdt.com.1047 ESTABLISHED
tcp 0 0 netbsd.supfiles talc.rendition.c.1385 TIME_WAIT
tcp 0 0 netbsd.supfiles cynic.portal.ca.1157 TIME_WAIT
tcp 0 0 netbsd.supfiles pgoyette.bdt.com.1046 TIME_WAIT
tcp 0 0 netbsd.supfiles pgoyette.bdt.com.1045 TIME_WAIT
tcp 0 0 netbsd.supfiles pgoyette.bdt.com.1044 TIME_WAIT
tcp 0 15528 netbsd.supfiles crystal.PEAK.ORG.1492 ESTABLISHED
tcp 0 8259 netbsd.supfiles ftp.cs.umn.edu.10382 ESTABLISHED
tcp 0 0 netbsd.supfiles dogbert.cs.chalm.4743 ESTABLISHED
tcp 0 15687 netbsd.supfiles tia1.eskimo.com.1808 ESTABLISHED
tcp 0 15207 netbsd.supfiles white.dogwood.co.1218 ESTABLISHED
In case anyone's wondering why I mucked with it in the first place, I
needed to add some things to the current collection, and it became clear
that a revision controlled, automated way of updating the server configs
was needed. 6 typos later, the server decided to get confused (I can't
think of any reason why a script failing would have caused the server to
wedge in CLOSE_WAIT). The CLOSE_WAIT lossage could have been caused by
intermediate circuit problems ...
Again, I apologize for any inconvenience this may have caused you. If
you observe any further problems, please let me know... (I know I'm going
to regret saying that... :-)
Jason R. Thorpe
NetBSD Core Group
<thorpej@NetBSD.ORG>