Subject: More info on my 1.6 stability problems
To: None <port-cobalt@NetBSD.org>
From: ian <cobalt@minimal.cx>
List: port-cobalt
Date: 01/03/2004 08:48:52
Hi all,
I've not got any further in terms of kernel debugging, but as the Qube
is/was my main server I've been setting up a second machine to do the
work and this has had an expected result: a more stable Qube. At this
point in time the Qube is still doing all the SMTP/IMAP/HTTP/SMB stuff
- it's only had the DNS and DHCP removed which weren't a huge load but
were easiest to remove without any downtime.
Previously I was getting 23 to 25 day uptimes before the system would
simply lock up and need a manual reboot, and this is the data as I'm
writing this email:
# Uptime | System
Boot up
----------------------------
+-------------------------------------------------
-> 1 83 days, 19:46:03 | NetBSD 1.6 Sat Oct 11
13:42:57 2003
2 25 days, 08:18:16 | NetBSD 1.6 Sat Sep 6
08:58:46 2003
3 24 days, 03:40:25 | NetBSD 1.6 Mon Aug 4
16:29:38 2003
4 23 days, 01:21:42 | NetBSD 1.6 Sat Jul 12
15:06:02 2003
5 8 days, 08:48:14 | NetBSD 1.6 Thu Aug 28
20:12:10 2003
6 5 days, 23:20:34 | NetBSD 1.6 Sun Oct 5
13:30:03 2003
7 3 days, 18:20:22 | NetBSD 1.6 Wed Oct 1
18:56:33 2003
----------------------------
+-------------------------------------------------
There have been zero hardware changes between the top four times - the
screws have remained in their sockets, although the ambient temperature
here has dropped. The main difference is that I have removed my
802.11b access point from the secondary ethernet port, and as a result
have stopped using IPSec encryption over the second interface and have
removed IPFilter which was firewalling everything on the second
interface except IPSec, DHCP and DNS traffic (the kernel is the same).
I also had it packet logging, and so my /etc/rc.conf now has these
(among other) "NO" entries:
ipfilter="NO"
ipnat="NO"
ipmon="NO"
racoon="NO"
ipsec="NO"
pgsql="NO"
The other thing I did after the number 2 crash was to reset the system
and then issue a shutdown command to cleanly reboot the machine - I
can't be sure of the significance of that, though.
From reading the cobalt22 (Linux 2.2/2.4) mailing list early in 2003
(the info may have been from 2002, though) it did appear that the Linux
port was ok to use unless serial and ethernet activity happened at the
same time, and having both ethernet interfaces active (with the serial,
possibly) caused a lock up in pretty short order. It didn't appear to
matter whether I was using my machine when it would lock up, but
because I was running routed as well as samba there would be broadcast
packets using the second interface (or at least being blocked by
IPFilter) even if I wasn't at home.
Nothing conclusive except that I now consider my PSU, RAM and harddrive
to be ok and I think that when I've shifted all of my data onto the new
system I'll start looking at the ethernet throughput issues I was
having: hopefully the slow data rate and the crashes will be related :)
Yeah, well I can dream...
This might explain the widly differing views on system uptime on the
list though.
TTFN,
--
ian.
GPG Fingerprint: D170 35A3 C858 6E85 9B5B 1557 4CD5 6F6F E176 2D0A