Subject: [Fwd: Re: Qube 2/NetBSD 1.6 regular instability]
To: None <port-cobalt@netbsd.org>
From: Rodrigo Fernandez-Vizarra <Rodrigo.Fdz-Vizarra@infonegocio.com>
List: port-cobalt
Date: 10/05/2003 20:01:04
This is a multi-part message in MIME format.
--------------030302060008060807000608
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Hi,
Ian, we are not the only ones with this problem...
I'm trying to develop some test to see if its only a network problem or
is a network + hd i/o problem. Any kernel developer in the list?
Regards,
Rodrigo
--------------030302060008060807000608
Content-Type: message/rfc822;
name="Re: Qube 2/NetBSD 1.6 regular instability"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="Re: Qube 2/NetBSD 1.6 regular instability"
by cajita.serial.es (Postfix) with ESMTP id 4253856168
for <rodrigo@localhost>; Sun, 5 Oct 2003 19:34:26 +0200 (CEST)
by localhost with POP3 (fetchmail-6.2.2)
for rodrigo@localhost (single-drop); Sun, 05 Oct 2003 19:34:26 +0200 (CEST)
mb111.terra.es (terra.es) with ESMTP id HMAOSC00.8WA for
<rodrigo.fdz-vizarra@infonegocio.com>; Sun, 5 Oct 2003 19:33:48 +0200
telesmtp2.mail.isp (terra.es) with ESMTP id HMAOSB01.16Z for
<Rodrigo.Fdz-Vizarra@infonegocio.com>; Sun, 5 Oct 2003 19:33:47 +0200
Sun, 5 Oct 2003 18:31:45 +0100
by pop-b.ucl.ac.uk (8.11.7p1+Sun/8.9.3) with ESMTP id h95HVeI08008
for <Rodrigo.Fdz-Vizarra@infonegocio.com>;
Sun, 5 Oct 2003 18:31:40 +0100 (BST)
Mime-Version: 1.0
Message-Id: <p05210601bba6035b3039@[128.40.218.142]>
In-Reply-To: <3F805125.9090803@infonegocio.com>
References: <20031002131010.GA21543@minimal.cx> <3F7DFED4.50900@infonegocio.com> <p05210600bba5ddce633f@[128.40.218.142]> <3F805125.9090803@infonegocio.com>
Date: Sun, 5 Oct 2003 18:31:39 +0100
To: Rodrigo Fernandez-Vizarra <Rodrigo.Fdz-Vizarra@infonegocio.com>
From: Frank Mattes <f.mattes@ucl.ac.uk>
Subject: Re: Qube 2/NetBSD 1.6 regular instability
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Dear Rodrigo,
I bought the cube2 as a private fileserver, mainly to backup files
via netalk, because (my main machine was running under OS-9 at that
time). I could never copy ~ 2000 files at once on the cube or back
(under netalk), the machines always hang. I thought it's the linux
and this was the main reason to install netBSD, however I had the
same expirience under netBSD.
I never did any performance measures (we have a 100 Mbyte network),
but the performance was never great (I can't remember the exact
figures).
My cube had a disk crash a couple a month ago, and I just bought a
Maxtor 80 Mbyte yesterday, and haven't had a change to install netBSD
1.6.1
(I hope a Maxtor drive was a good choice, its 7200 rpm, I couldn't
het any 5000 rpm anymore, and the only drives around arefrom WD,
Maxtor or IBM/Hitachi)
Yes please post my mail.I thought it goes to the group anyway
Frank
>Hi Frank,
>
>Did you had the same low performance problem with the original Linux
>too?... If you had the same problems with Linux I'm starting to
>think it's a hardware problem :-( ... did you get any log related
>with the hangs in Linux?
>
>This is very anoying for me, having these low uptimes is not good,
>but I don't know how to trace the problem.
>
>Do you mind if CC your message to the maillist? perhaps there's
>someone else with the same problem that can help. BTW is your Qube2
>modified from the original one? new HD, more RAM, etc ...
>
>Regards,
>Rodrigo
>
>
>Frank Mattes wrote:
>
>>I noticed this with the 1.6 netBSD but also with the loriginal
>>linux installation. Downloading > 600 Mbyte images with nettalk or
>>ftp caused frequently hangs. I couldn't back up the qube disk via
>>my mac.
>>
>>Frank
>>
>>>I'm having instability problems with my Qube2 too.
>>>
>>>My Qube2 has 128 Mb of RAM, 64Mb from the original Setup and 64Mb
>>>more from another Qube2. I've replaced the original 13Gb HD for a
>>>40Gb hd seagate barracuda (if I don't remember wrong).
>>>
>>>I use to have many pmap_unwire with Netbsd 1.6. But now with
>>>Netbsd 1.6.1 I don't have any in my logs... but still hangs from
>>>time to time. I have noticed that when the system has high network
>>>activity the system tends to hang in less time. Did you notice
>>>something similar? High network load -> hang (it takes one or two
>>>days to hang)
>>>
>>>I've similar network performance problems, it's supposed to be
>>>100/10 ethernet, but I never get more than 900Kb/s even with a
>>>crossover cable. I don't know if the problem is with the hardware
>>>of with the software (the driver).
>>>
>>>The worst thing is that I don't have a clue about how to trace the problem.
>>>
>>>Regards,
>>>Rodrigo
>>>
>>>Ian Spray wrote:
>>>
>>>>Hi all,
>>>>
>>>>I've been having problems with my Qube 2 running a custom 1.6 kernel for
>>>>some time and could do with some advice on how to go about troubleshooting
>>>>it. Using the original Linux 2.0 kernel that came with the system I got
>>>>over 60 days uptime, but as you can see from my live stats
>>>>(http://minimal.cx/uptime.php) I've getting between 23 and 25
>>>>days. The big
>>>>Linux stats were sadly lost in a NetBSD crash...
>>>>
>>>>The main change from the Linux days is the hard drive upgrade from 10GB to
>>>>120GB and the RAM increase from 96MB to 192MB. The RAM was bought from
>>>>Crucial as approved Cobalt Qube 2 RAM, and I spent ages checking the power
>>>>consumption figures of hard drives to ensure that the 120GB model
>>>>was within
>>>>10% of the values for the original 10GB one (it only exceeds the 10GB
>>>>figures at startup - operating currents are actually lower).
>>>>
>>>>The system logs typically show no useful information - the system simply
>>>>stops and so far I've not had it hooked up to a serial terminal to try to
>>>>get any sensible kernel debug output (this is abviously step one !). The
>>>>last dmesg does have a lot of pmap_unwire errors and also an IDE DMA
>>>>problem, but the most interesting thing is that the system dies in the
>>>>middle of writing out the pmap_unwire error:
>>>>
>>>>pmap_unwire: wiring for pmap 0x810fb2c0 va 0x7fffc000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fbd00 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb5e0 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fbe60 va 0x7fffa000 didn't change!
>>>>wd0a: DMA error reading fsbn 82942512 of 82942512-82942639 (wd0
>>>>bn 83515823; cn 82852 tn 15 sn 62), ret rying
>>>>wd0: soft error (corrected)
>>>>pmap_unwire: wiring for pmap 0x810fbde0 va 0x10012000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb960 va 0x7fffc000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb7e0 va 0x1001a000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fbdc0 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb5e0 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fbce0 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb9a0 va 0x7fffc000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb300 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb300 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb4e0 va 0x10012000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb9a0 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fbb40 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring for pmap 0x810fb660 va 0x7fffa000 didn't change!
>>>>pmap_unwire: wiring fo\^C\^PTap 0x810fb860 va 0x7fffa000 didn't change!
>>>>Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002
>>>> The NetBSD Foundation, Inc. All rights reserved.
>>>>Copyright (c) 1982, 1986, 1989, 1991, 1993
>>>> The Regents of the University of California. All rights reserved.
>>>>
>>>>
>>>>I am also running with soft_deps on, but would have expected problems to
>>>>show up long before the 22 day mark - the system runs BIND, Apache, sshd,
>>>>mrtg, samba, spamd, exim and serves up quite a few large files from PHP
>>>>enabled web sites hosted on it. It can have a load peak of 14.60 (maybe
>>>>higher but that's all I've observed) due to some unfriendly perl
>>>>jobs but it
>>>>tends to average no more than 4 in a typical day.
>>>>
>>>>I haven't seen anything mentioned in the CVS logs for
>>>>sys/arch/mips/mips/pmap.c that might indicate that the unwire message is
>>>>fixed in the MAIN branch, and I've also not seen any evidence to say that
>>>>pmap is even a problem. Does anyone else have a loaded Qube 2 with similar
>>>>problems ? I would assume not, or there would have been more emails like
>>>>this !
>>>>
>>>>I also experience really slow network I/O (in the archives) and
>>>>am wondering
>>>>if I've simply got hardware that isn't perfect. I'm currently open to even
>>>>the wildest suggestions, as the only option open to me at the moment is to
>>>>schedule a reboot every 14 days, which makes me little better than a
>>>>Windows admin :( I also need to have an alternative server in place before
>>>>I can do really tough testing/proper kernel+serial debug (it's too
>>>>important to do without), so I'm hoping to collect ideas whilst I'm putting
>>>>one together.
>>>>
>>>>About the only other thing I've changed is to increase kern.maxvnodes to
>>>>40000 as the command line respose of the system was appaling with the
>>>>default 11000 (and something) value. The system is running behind an APC
>>>>BackUPS 600 and so the input power should be clean and stable.
>>>>
>>>>Thanks in advance,
--
Frank Mattes, MD e-mail: f.mattes@ucl.ac.uk
Department of Virology fax 0044(0)207 8302854
Royal Free Hospital and tel 0044(0)207 8302997
University College Medical School
London
--------------030302060008060807000608--