Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Strange deadlock



BERTRAND Joël a écrit :
Michael a écrit :
Hello,

On Mon, 21 Jul 2014 21:30:47 +0200
BERTRAND Joël <joel.bertrand%systella.fr@localhost> wrote:

    Hello,

    Some weeks ago, I have indicated here I have a stability issue
with my
Blade2000 running NetBSD :

trap type 0x68: cpu 1, pc=410657c0 npc=410657ac
pstate=0xffffffff99820092<,PEF,IE>
Skipping crash dump on recursive panic
panic: +fast data access MMU miss
cpu1: Begin traceback...
cpu1: End traceback...
cpu0: shutting down
cpu1: rebooting

    That was with 6.99.40 kernel. I have changed a faulty memory module,
rebuilt all packages I use from sources (perl, gtk and all dependencies)
and rebuilt a new kernel :

NetBSD legendre.systella.fr 6.99.47 NetBSD 6.99.47 (CUSTOM) #1: Sun Jul
20 15:21:37 CEST 2014
root%legendre.systella.fr@localhost:/usr/src/sys/arch/sparc64/compile/CUSTOM
sparc64

    CUSTOM stands for GENERIC.MP with some minor modifications :

sd0     at scsibus4 target 0 lun 0
sd1     at scsibus4 target 1 lun 0
cd0     at scsibus0 target 6 lun 0
...
# Wedge support
options         DKWEDGE_AUTODISCOVER
options         DKWEDGE_METHOD_GPT

    With 6.99.40 kernel, this workstation was stable enough to build
packages from sources (load average : 6). With 6.99.47, it has hanged
just after a few hours.

    At 12:48 (CEST), this workstation has crashed under heavy I/O (gcc48
compilation). Something very similar to a deadlock and I have rebooted
it with power button. Console didn't respond, network subsystem didn't
answer to ping... There was no information on console or serial line. No
panic and no dump file.

    I don't think if it is sparc specific, I don't have any other
information. Has anyone ever seen the same problem?

I've seen my sb2500 spontaneously reboot under heavy load with .47,
it's been rock stable with .44 ( and many, many older revisions )

     In my case, it's not a reboot, only a complete deadlock and I
haven't found any sparc64 specific modification between .40 that panics
and reboots and .47 that locks...

     Regards,

     JKB

I think I have found why my blade2000 was not stable. One of CPU was dying. With a new one, system is seems to be stable with a load average greater than 6 during several days (kernel 6.99.49).

        I only have to buy new CPU's :-(

        Regards,

        JKB



Home | Main Index | Thread Index | Old Index