Port-sparc64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: envstat drivers on SUN Ultra-45?
Hi,
> > https://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/
>
> Thanks, grabbed & tested.
>
> > "u45.diff" is the patch file, "netbsd" and "netbsd.gdb" is the kernel with
> > these and "u45-envsys.conf" is the sensor names for /etc/envsys.conf.
> > "u45.dmesg" is the full dmesg, but relevant lines are:
>
> Just grabbed kernel image and envsys config, applied and rebooted:
> pci-fan: 0 RPM
> system-fan3: 0 RPM
> system-fan4: 0 RPM
>
> And yes, the 3x fan group (one FRU, two fans pointed at CPUs and memory
> banks, one at the PCI(e,-X) cards) not running is something I suspected
> (system thermal issues), but it's nice to be able to check for it, now
> I can figure out how to fix it. I think I misrouted at least one cable
> when I replaced the mainboard, thus causing mechanical alignment issues
> for the fan group connector.
It's nice to see the confirmation from the software! The service manual
(819-1892-12.pdf) has a reasonable overview at the start of section 3.1
showing all the motherboard connections. I hope that it's easy to solve.
If not, I can take pictures of the fan cable route on mine.
Your CPU temperatures look normal:
> cpu0-sensor: 64.500 degC
> cpu1-sensor: 60.250 degC
I guess that means that I do need to redo the thermal paste on mine (not
looking forward to that on something hard/expensive to replace).
> > [ 1.000000] bq4802rtc0 at ebus0 addr 100000-10000f: real time clock
When I was testing today, I saw:
[ 40.6304158] WARNING: clock lost 2 days
and I double-checked the chip documentation. I see that I inverted the
"stop" bit so that the driver stopped the clock when the power is off.
I've corrected this now and uploaded a new kernel and patches:
https://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/netbsd
https://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/u45.diff
This does mean that you'll need to try this kernel (or an autobuild one)
to correctly setup the RTC again - apologies for that.
This kernel also has some additions to the adt7462 driver. Limits are
read from the chip, voltages are added and I also check the fan fault
register. This does lead to slightly wierd results though.
Current CritMax WarnMax WarnMin CritMin Unit
[adt7462sm0]
...
V1.5 1: 1.490 1.989 0.250 V
V1.5 2: 1.498 1.989 V
V3.3 1: 1.238 4.386 V
Vbatt: 1.232 3.978 1.997 V
V12 3: 11.938 15.938 V
V5 3: 5.018 6.630 V
fan fault: 100000 0 0 0 0 none
It looks like Sun wired the 3.3V and the battery sensor to 1.2 volt lines,
but didn't remove the default low limit for the battery. On boot, I see:
[ 33.5647675] adt7462sm0: warning under limit on 'Vbatt'
from the battery voltage low limit, and:
[ 33.6347678] adt7462sm0: fan fault status change: 00000000 -> 00100000
[ 33.7147673] adt7462sm0: critical limit on 'fan fault'
from the fan fault register. Rather than display the hex value, I thought
that it's nice to display a 1 if the fan has failed and 0 otherwise. It
currently reads from the right and tells me that fan 6 has failed. I had
assumed that there were only 5 fans, but it's possible that fan 6 is the
PSU fan and just not shown on Solaris. You should see a value of 011100
or 111100 for your non-working sytems fans (3, 4 and 5). If you see the
latter, I'll mask out fan 6 from the results.
Finally, trying to pick a name for the sensors (hard :-) I renamed the
lm75a to nxp75a so that it is less likely to be confused with the lm75a
that lmtemp has. This needs a matching change to envsys.conf.
> One other reason for the high thermal readouts besides the non-running
> fan group in my machine is that ambient temperature is ~ 30 degC, which
> is probably close to the upper end of the machine design envelope ;-)
That is a good question - 30 C was around the upper limit for running
servers, but I guess that without the main fans, it's definitely getting
toward the limits!
> Thank you very much for your work improving the sensor support for the
> Ultra-45, I very much appreciate it.
A pleasure! I'm glad that it's useful!
I think that I'm close to getting the changes committed. I was looking at
the dbcool manual page and I see that has sysctl settings to vary the
automatic fan control settings (Tman and Trange). I did try manually
adjusting the system fan Trange and I could hear the change in fan sound.
I might commit first and then add that later though.
Regards,
Julian
--
Home |
Main Index |
Thread Index |
Old Index