tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: dbcool, envsys, powerd shutting down my machine



Hi,

> envstat -d dbcool0 says:
>            Current  CritMax  WarnMax  WarnMin  CritMin  Unit
> [...]
> r2_temp:    53.250   54.000                     45.000 degC
> [...]

> If I read that correctly, it means that at 54 degC, it's time for emergency 
> shut-down, while only at 57 degC, fans have to run at full speed.
> (Also, it seems to be threatning the hardware if that temp falls below 45.)
> 
> I have no clue where that magic value of 54 degC comes from. It's not in any 
> config file I can find, I don't find such a value in sys/dev/i2c/dbcool.c. 
> Is it the BIOS writing that value into the IC? Is it a chip manufacturer 
> default?

The drivers display the limit values that are read from the sensor chip [*].
So, this chip is set to maximum 54'C and minimum 45'C.  In this case, the
critical maximum does seem too close to the current value.  The chip default
is probably around 80'C [#], so this lower limit is likely to have been
programmed in at boot time.

> The board is a Tyan S2882-D, in case that matters.
> (Btw., does anyone know what r2_temp on that board is?)

The chip has an onboard temperature sensor (l_temp) and two remote sensor
connections (r1_temp, r2_temp).  The "Hardware Health Event Monitoring"
section of the motherboard manual [+] notes that the temperatures that are
monitored are System, CPU1 VRM, CPU1, CPU2, and CPU2 VRM, so I would guess
that this is one of the VRM temperatures (it's too high for ambient/system,
and too low for CPU).  Is there a dbcool1 r2_temp with a similar value and
limits?

> After running a customized kernel, I found a server powered down.
> The culprit turned out to be dbcool->envsys->powerd fabulating some temperature
> rose above limits.

> I turned off powerd for now.

You can alter the limits from software - see the dbcool manual page for how
to set up envsys.conf.  Maybe increasing the value by 3'C or so would be OK.
Also, I see that the latest BIOS version appears to be 3.09 - is that the
version that you have?  There might be fixes for the temperature limits,
although they're not mentioned in the release notes.

Regards,

J

[*] http://nxr.netbsd.org/xref/src/sys/dev/i2c/dbcool.c#1935
[#] See the appropriate chip datasheet referenced from .../dbcool.c#35
[+] ftp://ftp.tyan.com/manuals/m_s2882d_100.pdf

-- 
   My other computer runs NetBSD too   -         http://www.netbsd.org/


Home | Main Index | Thread Index | Old Index