Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Noisy ipmi0 after bootup? (really other kernel noise)



On Fri, 5 Sep 2008, Paul Goyette wrote:

 ipmi0: critical over limit on 'Fan8/CPU2'
 ipmi0: critical over limit on 'Fan8/CPU2'
 ipmi0: critical over limit on 'Fan7/CPU1'
 ipmi0: critical over limit on 'Fan7/CPU1'
 ipmi0: critical over limit on 'Fan6'
 ipmi0: critical over limit on 'Fan6'
 ipmi0: critical over limit on 'Fan5'
 ipmi0: critical over limit on 'Fan5'
 ipmi0: critical over limit on 'Fan4'
 ipmi0: critical over limit on 'Fan4'

OK, while I wrestle with having a proper method of detecting a missing vs broken fan, I've figured out why you're getting duplicate reports.

It seems that the ipmi device is actually providing two separate items to be monitored for each sensor! It is monitoring the sensor's value being within specified limits, AND it is monitoring a general "critical" indicator. Each of these is an independant monitor, and you're seeing each of them as separate events. For what it's worth, I'm also seeing double on my acpitz sensors.

I don't think this happened previously, so I've obviously broken something. I'll keep at it.

Yup, I broke something! In particular, there used to be a check to make sure that the event/state being reported was appropriate for the item being monitored. So if the state was ENVSYS_SCRITOVER it would only be reported for the PENVSYS_EVENT_CRITOVER. When I grouped all the limit-
related EVENTs together, I erroneously included PENVSYS_EVENT_CRITICAL.

The attached patch will once again separate PENVSYS_EVENT_CRITICAL from the rest.


----------------------------------------------------------------------
|   Paul Goyette   | PGP DSS Key fingerprint: |  E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 |  paul%whooppee.com@localhost   |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette%juniper.net@localhost |
----------------------------------------------------------------------
Index: sysmon_envsys_events.c
===================================================================
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys_events.c,v
retrieving revision 1.57
diff -u -p -r1.57 sysmon_envsys_events.c
--- sysmon_envsys_events.c      4 Sep 2008 21:54:51 -0000       1.57
+++ sysmon_envsys_events.c      6 Sep 2008 02:52:25 -0000
@@ -55,7 +55,6 @@ struct sme_sensor_event {
 
 static const struct sme_sensor_event sme_sensor_event[] = {
        { ENVSYS_SVALID,        PENVSYS_EVENT_NORMAL },
-       { ENVSYS_SCRITICAL,     PENVSYS_EVENT_CRITICAL },
        { ENVSYS_SCRITOVER,     PENVSYS_EVENT_CRITOVER },
        { ENVSYS_SCRITUNDER,    PENVSYS_EVENT_CRITUNDER },
        { ENVSYS_SWARNOVER,     PENVSYS_EVENT_WARNOVER },
@@ -606,7 +605,6 @@ sme_events_worker(struct work *wk, void 
        /*
         * For hardware and user range limits, send event if state has changed
         */
-       case PENVSYS_EVENT_CRITICAL:
        case PENVSYS_EVENT_HW_LIMITS:
                if (edata->state != see->see_evsent) {
                        for (i = 0; sse[i].state != -1; i++)
@@ -628,6 +626,25 @@ sme_events_worker(struct work *wk, void 
                break;
 
        /*
+        * For critical state monitoring, only two event values are valid:
+        *      ENVSYS_SVALID or ENVSYS_SCRITICAL
+        * Send corresponding event if state has changed.
+        */
+       case PENVSYS_EVENT_CRITICAL:
+               if (edata->state != see->see_evsent) {
+                       if (edata->state == ENVSYS_SVALID) {
+                               sysmon_penvsys_event(&see->see_pes,
+                                                    PENVSYS_EVENT_NORMAL);
+                               see->see_evsent = edata->state;
+                       } else if (edata->state == ENVSYS_SCRITICAL) {
+                               sysmon_penvsys_event(&see->see_pes,
+                                                    PENVSYS_EVENT_CRITICAL);
+                               see->see_evsent = edata->state;
+                       }
+               }
+               break;
+
+       /*
         * if value_cur is not normal (battery) or online (drive),
         * send the event...
         */


Home | Main Index | Thread Index | Old Index