Re: -10, spurious reboots and instability

To: David Brownlee <abs%absd.org@localhost>
Subject: Re: -10, spurious reboots and instability
From: BERTRAND Joël <joel.bertrand%systella.fr@localhost>
Date: Mon, 1 Jan 2024 10:54:47 +0100

David Brownlee wrote:

On Fri, 29 Dec 2023 at 16:29, BERTRAND Joël <joel.bertrand%systella.fr@localhost> wrote:


David Brownlee wrote:

On Wed, 27 Dec 2023 at 08:25, BERTRAND Joël <joel.bertrand%systella.fr@localhost> wrote:


          Hello,

          Yesterday, I have changed my system disk (raid0). Thus, system has
rebuilt a 1 To raid1 volume and system has crashed three or four times.

          First time :

[  5235.028358] uvm_fault(0xffffffff8190fbc0, 0xfffff6fc5a75b000, 2) -> e
[  5235.028358] fatal page fault in supervisor mode
[  5235.028358] trap type 6 code 0x2 rip 0xffffffff80ea1063 cs 0x8
rflags 0x10246 cr2 0xfffff6fc5a75bb98 ilevel 0 rsp 0xffffac04372f5e98
[  5235.028358] curlwp 0xfffff8b686f8a180 pid 0.17 lowest kstack
0xffffac04372f12c0
[  5235.028358] panic: trap
[  5235.028358] cpu2: Begin traceback...
[  5235.028358] vpanic() at netbsd:vpanic+0x183
[  5235.028358] panic() at netbsd:panic+0x3c
[  5235.028358] trap() at netbsd:trap+0xbaf
[  5235.028358] --- trap (number 6) ---
[  5235.028358] _atomic_swap_64() at netbsd:_atomic_swap_64+0x3
[  5235.028358] uvm_km_pgremove_intrsafe() at
netbsd:uvm_km_pgremove_intrsafe+0x6d
[  5235.028358] uvm_km_kmem_free() at netbsd:uvm_km_kmem_free+0x3b
[  5235.028358] gc_thread() at netbsd:gc_thread+0x7c
[  5235.028358] cpu2: End traceback...

[  5235.038351] dumping to dev 18,1 (offset=251919, size=4162814):
[  5235.038351] dump

          Of course, no crash dump was written. I have tried to remove swap in a
first time, and system randomly enters in a lock and reboots (maybe with
help of watchdog). After rebuild was completed, system seems to be stable.

          Sorry, I'm unable to obtain more information.


Could I ask what controller this was using, and what vintage NetBSD-10
(BETA or RC_1)? There was an issue with the mfi controller which
showed up on earlier netbsd-10 BETA versions


         You can ;-)

         It's a RC_1:


OK, so pretty damn recent :-p

legendre# uname -a
NetBSD legendre.systella.fr 10.0_RC1 NetBSD 10.0_RC1 (CUSTOM) #10: Thu
Dec 21 09:51:28 CET 2023
root%legendre.systella.fr@localhost:/usr/src/netbsd-10/obj/sys/arch/amd64/compile/CUSTOM
amd64
and tree was updated juste before building system.

         I have to add that after raid is successfully rebuilt, system is
stable. I don't remember last time I have rebuilt this volume.


So depending on when it was originally installed (netbsd-8?) it could
be possible that whatever issue was present in netbsd-9 also?

Possible. Boot block indicates -8_BETA. But if I remember, this serverwas installed with Toshiba disks and now, it runs with WD Gold disks asroot devices. Thus, I suppose I have rebuilt raid without trouble with a-9.x system.

         I only use raidframe on regular sata interface. Motherboard is an Asus
Z97Q (if I remember):

legendre# lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA
Controller [AHCI Mode]
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9120 SATA
6Gb/s Controller (rev 12)

         Raid0 and raid1 are connected to 00:1f.2 (wd2 to wd6).


OK, so definitely not mfi related...

I don't have any directly useful suggestions.

Depending on hardware availability and risk of downtime tolerance I'd
be tempted to try to reproduce on different hardware - cloning data to
another box, or temporarily moving disks across and then triggering a
raid rebuild to eliminate any overall issues with the base hardware.

Could the power supply be marginal - could rebuilding all disks and
some other activity be pushing it a little too hard? Running a few
copies of sysutils/cpuburn might flush out some issues if that was the
case

I don't think as this server is stable even if CPU usage is around100%. Now, uptime is greater than 5 days and some bacula processes arespawn at 23h00 daily. When bacula is spawn CPU usage is very high duringseveral hours.


	Regards,

	JB

References:
- -10, spurious reboots and instability
  - From: BERTRAND Joël
- Re: -10, spurious reboots and instability
  - From: David Brownlee
- Re: -10, spurious reboots and instability
  - From: BERTRAND Joël
- Re: -10, spurious reboots and instability
  - From: David Brownlee

Prev by Date: Re: -10, spurious reboots and instability
Next by Date: Prob using CPIO
Previous by Thread: Re: -10, spurious reboots and instability
Next by Thread: X11 modular-xorg
Indexes:

Home | Main Index | Thread Index | Old Index