Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Noisy ipmi0 after bootup? (really other kernel noise)



On Mon, 8 Sep 2008, Simon Burge wrote:

OK, if you apply the attached patches, you should see the following
changes:

1. ipmi should report CRITUNDER events for the fans that are at 0 RPM

2. you should get only one event for each fan

With this I get:

   ipmi0: critical under limit on 'Fan8/CPU2'
   ipmi0: critical under limit on 'Fan7/CPU1'
   ipmi0: critical under limit on 'Fan6'
   ipmi0: critical under limit on 'Fan5'
   ipmi0: critical under limit on 'Fan4'

All "under", and no double messages.  There's still is question of why
these are reported at all - since the value is always zero I suspect
they really don't have fans, but not sure how we should detect this.

OK, the results are as I expected. I already committed the changes to sysmon to prevent the double notifications, and with your confirmation here I'll go ahead and commit the ipmi changes to differentiate between over-limit and under-limit conditions.

As for detecting missing fans, ideally it would be the physical sensor's and/or BIOS's job to figure out that there's nothing attached to the fan header. I'm not a hardware type, but I would expect that it could detect that there was zero current flowing and properly report a "not
present" condition.

Others have suggested that we could ignore fan sensors that have never reported a non-zero value, but I'm not sure that's the right answer. As you can see from ipmitool output, it's reporting a "nr" (not rotating) state, not "na".

And the output of "ipmitool ... sensor":

CPU Temp 1       | 30.000     | degrees C  | ok    | na        | na        | na 
       | 76.000    | 78.000    | 80.000
CPU Temp 2       | 24.000     | degrees C  | ok    | na        | na        | na 
       | 76.000    | 78.000    | 80.000
CPU Temp 3       | na         | degrees C  | na    | na        | na        | na 
       | 76.000    | 78.000    | 80.000
CPU Temp 4       | na         | degrees C  | na    | na        | na        | na 
       | 76.000    | 78.000    | 80.000
Sys Temp         | 28.000     | degrees C  | ok    | na        | na        | na 
       | 76.000    | 78.000    | 80.000
CPU1 Vcore       | 1.192      | Volts      | ok    | 1.056     | 1.064     | 
1.072     | 1.624     | 1.632     | 1.640
CPU2 Vcore       | 1.192      | Volts      | ok    | 1.056     | 1.064     | 
1.072     | 1.624     | 1.632     | 1.640
3.3V             | 3.312      | Volts      | ok    | 2.912     | 2.928     | 
2.944     | 3.648     | 3.664     | 3.680
5V               | 4.992      | Volts      | ok    | 4.416     | 4.440     | 
4.464     | 5.520     | 5.544     | 5.568
12V              | 11.808     | Volts      | ok    | 10.464    | 10.560    | 
10.656    | 13.344    | 13.440    | 13.536
-12V             | -12.300    | Volts      | ok    | -10.500   | -10.600   | 
-10.700   | -13.300   | -13.400   | -13.500
1.5V             | 1.488      | Volts      | ok    | 1.296     | 1.312     | 
1.328     | 1.664     | 1.680     | 1.696
5VSB             | 4.896      | Volts      | ok    | 4.416     | 4.440     | 
4.464     | 5.520     | 5.544     | 5.568
VBAT             | 3.216      | Volts      | ok    | 2.912     | 2.928     | 
2.944     | 3.648     | 3.664     | 3.680
Fan1             | 6300.000   | RPM        | ok    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Fan2             | 6500.000   | RPM        | ok    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Fan3             | 6100.000   | RPM        | ok    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Fan4             | 0.000      | RPM        | nr    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Fan5             | 0.000      | RPM        | nr    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Fan6             | 0.000      | RPM        | nr    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Fan7/CPU1        | 0.000      | RPM        | nr    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Fan8/CPU2        | 0.000      | RPM        | nr    | 200.000   | 300.000   | 
400.000   | na        | na        | na
Intrusion        | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na
Power Supply     | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na
CPU0 Internal E  | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na
CPU1 Internal E  | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na
CPU Overheat     | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na
Thermal Trip0    | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na
Thermal Trip1    | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na


One more thing to notice - are the ipmi sensors in some reverse-ordered
linked list?  The order of the sensors from envstat seems reversed.

Looks like internally to ipmi.c, the sensors are added to its own SLIST,
and then later on we walk through this list from head to tail to attach the sensors to sysmon. This would have the explicit effect of reversing the list's order. If this is particularly annoying, it should not be too difficult to change this from using an SLIST to SIMPLEQ.

Does maintaining the order of the sensor list appeal to others?


----------------------------------------------------------------------
|   Paul Goyette   | PGP DSS Key fingerprint: |  E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 |  paul%whooppee.com@localhost   |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette%juniper.net@localhost |
----------------------------------------------------------------------


Home | Main Index | Thread Index | Old Index