Subject: More info on my 1.6 stability problems
To: None <port-cobalt@NetBSD.org>
From: ian <cobalt@minimal.cx>
List: port-cobalt
Date: 01/03/2004 08:48:52
Hi all,

I've not got any further in terms of kernel debugging, but as the Qube  
is/was my main server I've been setting up a second machine to do the  
work and this has had an expected result: a more stable Qube.  At this  
point in time the Qube is still doing all the SMTP/IMAP/HTTP/SMB stuff  
- it's only had the DNS and DHCP removed which weren't a huge load but  
were easiest to remove without any downtime.

Previously I was getting 23 to 25 day uptimes before the system would  
simply lock up and need a manual reboot, and this is the data as I'm  
writing this email:

      #               Uptime | System                                    
Boot up
---------------------------- 
+-------------------------------------------------
->   1    83 days, 19:46:03 | NetBSD 1.6              Sat Oct 11  
13:42:57 2003
      2    25 days, 08:18:16 | NetBSD 1.6              Sat Sep  6  
08:58:46 2003
      3    24 days, 03:40:25 | NetBSD 1.6              Mon Aug  4  
16:29:38 2003
      4    23 days, 01:21:42 | NetBSD 1.6              Sat Jul 12  
15:06:02 2003
      5     8 days, 08:48:14 | NetBSD 1.6              Thu Aug 28  
20:12:10 2003
      6     5 days, 23:20:34 | NetBSD 1.6              Sun Oct  5  
13:30:03 2003
      7     3 days, 18:20:22 | NetBSD 1.6              Wed Oct  1  
18:56:33 2003
---------------------------- 
+-------------------------------------------------

There have been zero hardware changes between the top four times - the  
screws have remained in their sockets, although the ambient temperature  
here has dropped.  The main difference is that I have removed my  
802.11b access point from the secondary ethernet port, and as a result  
have stopped using IPSec encryption over the second interface and have  
removed IPFilter which was firewalling everything on the second  
interface except IPSec, DHCP and DNS traffic (the kernel is the same).   
I also had it packet logging, and so my /etc/rc.conf now has these  
(among other) "NO" entries:

ipfilter="NO"
ipnat="NO"
ipmon="NO"
racoon="NO"
ipsec="NO"
pgsql="NO"

The other thing I did after the number 2 crash was to reset the system  
and then issue a shutdown command to cleanly reboot the machine - I  
can't be sure of the significance of that, though.

 From reading the cobalt22 (Linux 2.2/2.4) mailing list early in 2003  
(the info may have been from 2002, though) it did appear that the Linux  
port was ok to use unless serial and ethernet activity happened at the  
same time, and having both ethernet interfaces active (with the serial,  
possibly) caused a lock up in pretty short order.  It didn't appear to  
matter whether I was using my machine when it would lock up, but  
because I was running routed as well as samba there would be broadcast  
packets using the second interface (or at least being blocked by  
IPFilter) even if I wasn't at home.

Nothing conclusive except that I now consider my PSU, RAM and harddrive  
to be ok and I think that when I've shifted all of my data onto the new  
system I'll start looking at the ethernet throughput issues I was  
having: hopefully the slow data rate and the crashes will be related :)  
  Yeah, well I can dream...

This might explain the widly differing views on system uptime on the  
list though.

TTFN,
-- 
ian.
GPG Fingerprint: D170 35A3 C858 6E85 9B5B  1557 4CD5 6F6F E176 2D0A