Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SunFire v240 hardware question



On Dec 2, 2012, at 2:17 PM, Chris Ross wrote:

> So I have a SunFire V240 I'm planning to swap in to replace an older 
> µSPARC-II machine.  (E420r).  The V240 is reporting:
> 
> SC Alert: CPU_FAN @ MB.P1.F0.RS has FAILED.
> 
> which I know means one of the [two] heatsink/fan units has a failed fan.  I'm 
> using this list to reach people who may well understand the problem, and the 
> solutions, and I have two questions.
> 
>  (1) Is this a problem?  I mean, if I'm not going to be pushing this machine 
> hard, do I need to worry about it?  (It will be mostly a mail server, pretty 
> lightly loaded, CPU-wise)

We have a couple of V240s afflicted with this ailment.

One of them actually has 2 failed fans:

:1:5 [/] # grep kern /var/adm/messages | nawk '{ print $5, $6, $7, $8, $9, $10, 
$11, $12 }' | sort | uniq
rmclomv: [ID 132814 kern.error] CPU_FAN at MB.P0.F0.RS Failed
rmclomv: [ID 261796 kern.error] CPU_FAN at MB.P1.F0.RS Failed
rmclomv: [ID 556555 kern.error] CPU_FAN at MB.P1.F0.RS Faulted
rmclomv: [ID 946266 kern.error] CPU_FAN at MB.P0.F0.RS Faulted

To make sure they're not going nuclear, we have "cron" jobs (on Solaris 10)
that do

/usr/sbin/prtpicl -v -c temperature-sensor | /usr/bin/egrep 'mb|Temperature'

to monitor the temperature sensors.

On the one with 2 failed fans they read

:1:7 [/] # psrinfo -v
Status of virtual processor 0 as of: 12/02/2012 17:22:03
  on-line since 03/31/2012 03:31:57.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 12/02/2012 17:22:03
  on-line since 03/31/2012 03:31:58.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.

 mb_p0_t_core (temperature-sensor, ad00000ad1)
 :Temperature    51 
 :name   mb_p0_t_core 
 mb_p1_t_core (temperature-sensor, ad00000ad9)
 :Temperature    51 
 :name   mb_p1_t_core 
 mb_t_enc (temperature-sensor, ad00000ae1)
 :Temperature    22 
 :name   mb_t_enc 

which actually isn't too bad.

On the other one

:1:355 [/] # psrinfo -v
Status of virtual processor 0 as of: 12/02/2012 17:22:58
  on-line since 12/31/1999 16:25:16.
  The sparcv9 processor operates at 1503 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 12/02/2012 17:22:58
  on-line since 12/31/1999 16:25:09.
  The sparcv9 processor operates at 1503 MHz,
        and has a sparcv9 floating point processor.

mb_p0_t_core (temperature-sensor, 40ee00000b0b)
 :Temperature    81 
 :name   mb_p0_t_core 
 mb_p1_t_core (temperature-sensor, 40ee00000b13)
 :Temperature    69 
 :name   mb_p1_t_core 
 mb_t_enc (temperature-sensor, 40ee00000b1b)
 :Temperature    27 
 :name   mb_t_enc 

which as you can see is quite a bit higher.  Obviously the higher
base value is due to the higher CPU speed, presumably.  But we
have no idea why the P0 temperature is so much higher than the P1.

I have no idea about replacement because we're on Oracle maintenance
and these machines are being decommissioned soon anyways (to be
replaced with client LDoms on a forthcoming T4).

        - Greg



Home | Main Index | Thread Index | Old Index