tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: removing an envsys temperature limit



On 2016-02-16 15:50, Constantine A. Murenin wrote:
On 16 February 2016 at 04:47, Edgar Fuß <ef%math.uni-bonn.de@localhost> wrote:
Is this a continuation of your prior thread:
     http://mail-index.netbsd.org/tech-userlevel/2016/01/30/msg009639.html ?
No. That was about a 54 degC CritMax for CPU Temp.

So, this is about the 45degC CritMin part?  I see.

Full envsys output from the prior thread for reference:
     http://mail-index.netbsd.org/tech-userlevel/2016/02/01/msg009646.html


If so, what you're effectively asking is for
http://bxr.su/n/sys/dev/i2c/dbcool.c#dbcool_get_temp_limits to stop
reading the limits from the chip, which I don't think is supported.
It doesn't report any limit for sensor0 (l_temp), so there must be a way of
having "no limit".

Did you find out what was causing the limit to be low in the first place?
The low CritMax the other thread was about? No. It must be the BIOS, but
there's nothing I can sensibly do about that.

I am curious on your resolution -- when you modify these limits
through envsys, the values are supposed to get written back into the
chip (http://bxr.su/n/sys/dev/i2c/dbcool.c#dbcool_set_temp_limits).

Do they persist between the reboots and cold boots?  Or does CritMax
change back to 54degC at certain times?

Why do you want to remove the limit, instead of setting it to an actual
reasonable value?
I did set the CritMax limit to 70 degC in envsys.conf.


This thread is about a CritMin CPU Temp limit of 45 degC.
I don't think there's a sensible minimal CPU temperature. Maybe AMD specifies
a minimum temperature of, say, -20 degC for the Opteron 246, but, what shall
I do if I fall below that? Start some idle jobs to heat the CPU?

It depends on the environment.  What if someone's trying to use liquid
nitrogen to cool down and steal your memory modules together with
their content (like passwords and private keys)?  Or your datacentre
is in the North, and a wall or window broke down, and a snow storm is
coming? :-)


Of course, I can set the limit to some ridiculously low value, but that's
still displayed in envstat output.

And did you actually do it?

The PROP_CRITMIN property is reset in:
  http://BXR.SU/NetBSD/sys/dev/i2c/dbcool.c#dbcool_get_limits

And is only conditionally set back in:
  http://BXR.SU/NetBSD/sys/dev/i2c/dbcool.c#dbcool_get_temp_limits

The condition being is that the value in the register is not one of the lowest possible values allowed by the chip for the register (depending on the chip and configuration):

1945        if (lo_lim > 0x01) {
1946            lims->sel_critmin = lo_lim - sc->sc_temp_offset;
1947            *props |= PROP_CRITMIN;

1954        if (lo_lim != 0x80 && lo_lim != 0x81) {

E.g., if you attempt to set it to something like -128 (0x80) (or even -127 (0x81), or -63 (0x01) on chips like ADT7466 with the extended temperature range of -64degC to +191degC enabled, or any other value that is lower than the applicable value as per above (which will be automatically scaled up by the driver to the lowest possible register value as appropriate)), then PROP_CRITMIN should be gone.



No, depending on your chip, you might be limited to 0degC on the low
end, which is not that ridiculously low.

Correction: not 0degC, but -64degC in the extended temperature range conditions:

1503            sc->sc_temp_offset = 64;

1511            sc->sc_temp_offset = 64;

Which, as per the prior point above, would actually disable the monitoring, both within the chip and envsys on NetBSD.


http://bxr.su/n/sys/dev/i2c/dbcool.c#dbcool_set_temp_limits

2056        if (sc->sc_temp_offset) {
2057            limit += sc->sc_temp_offset;
2058            if (limit < 0)
2059                limit = 0;
2060            else if (limit > 255)
2061                limit = 255;

But some chips supported by dbcool let you go down to -127degC (at
least as far as the register settings go!), which is still not /that/
ridiculously low when you consider the absolute zero of -273,15degC.


One could argue that a CPU temp reading below 45 degC must be a sensor failure,
so it's OK to shut down.

Hardly; but it certainly depends on the CPU, system load and the
ambient temperature.

C.




Home | Main Index | Thread Index | Old Index