Subject: Re: time issues in 4.0 rc4 on a PC-ENGINES WRAP (SC1100)
To: David Lord <netbsd@lordynet.org>
From: theo borm <theo_nbsdhelp@borm.org>
List: netbsd-help
Date: 11/29/2007 13:01:21
David Lord wrote:

>On 29 Nov 2007, at 0:09, theo borm wrote:
>
>  
>
>>Hi,
>>
>>I seem have two problems with NetBSD 4.0 rc4, both in the kernel and 
>>(putatively) in the userland, and was wondering if anyone has a clue 
>>what to do about them.
>>
>>
>>1) Running a 4.0 rc 4 kernel on my PC-ENGINES WRAP board causes the 
>>clock to run approximately 37 times too slow.
>>
>>The kernel is a custom one (derived from the SOEKRIS one) with options 
>>TIMER_FREQ=1189200
>>    
>>
>
>I've a few different m/b that needed option TIMER_FREQ in order to 
>get frequency for use with ntpd within capture range. When I last 
>updated them in August, I found with same kernel config previously 
>used the frequency offset was way out, adjustments to TIMER_FREQ no 
>longer had desired effect and when I removed option TIMER_FREQ I was 
>getting near zero frequency offset. It seems there is now some 
>working auto calibration.
>
>Probably worth trying without that option but 37x slow might be a 
>different problem.
>
>David
>  
>

Recompiling /without/ the  TIMER_FREQ option seems equivalent to 
changing the value to default (1193182 Hz)
The system clock runs a little bit slower (39x versus 37x) WITHOUT the 
TIMER_FREQ  option than with the TIMER_FREQ=1189200 option


I took the plunge and compiled a kernel with (what I thought) was a 
reasonable correction factor (divide by 40) applied to the 
TIMER_FREQ=1189200 option. This produced a kernel with a clock that runs 
like mad.

cpu0: features 808131<FPU,TSC,MSR,CX8>
WARNING: broken TSC disabled
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 29730 Hz quality 100
timecounter: Timecounter "TSC" frequency 6652760 Hz quality 800
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0

After a while a trickle of the following errors:
wdc0:0:0: lost interrupt
        type: ata tc_bcount: 16384 tc_skip: 0
wd0a: device timeout reading fsbn 7107072 of 7107072-7107103 (wd0 bn 
7107135; cn 3470 tn 17 sn 31), retrying
wd0: soft error (corrected)

turn into a stream of:

wdc0 channel 0: reset failed for drive 0
wdc0:0:0: wait timed out
wd0a: device timeout writing fsbn 2501312 of 2501312-2501343 (wd0 bn 
2501375; cn 1221 tn 23 sn 31), retrying

(note: these errors do NOT occur with a kernel with a "more normal" 
TIMER_FREQ)

plus erratically blinking leds on all LAN ports

sleep 1 also definitely sleeps LESS than one second now.

i.o.w: definitely *not* good.


After this I took a more cautious approach, and compiled a few kernels 
with a range of settings:

TIMER_FREQ=1189200/2=594600
-> time/date runs ~ 18 times too slow
-> sleep 120 takes 60 seconds
timecounter: Timecounter "i8254" frequency 594600 Hz quality 100
timecounter: Timecounter "TSC" frequency 132888380 Hz quality 800

TIMER_FREQ=1189200/4=297300
-> time/date runs ~ 8 times too slow
-> sleep 240 takes 60 seconds
timecounter: Timecounter "i8254" frequency 297300 Hz quality 100
timecounter: Timecounter "TSC" frequency 66434860 Hz quality 800

TIMER_FREQ=1189200/8=148650
-> time/date runs ~ 3.2 times too slow
-> sleep 480 takes 60 seconds
timecounter: Timecounter "i8254" frequency 148650 Hz quality 100
timecounter: Timecounter "TSC" frequency 33226590 Hz quality 800

TIMER_FREQ=1189200/16=74325
-> time/date runs ~ 1.15 times too slow
-> sleep 960 takes 60 seconds
timecounter: Timecounter "i8254" frequency 74325 Hz quality 100
timecounter: Timecounter "TSC" frequency 16614960 Hz quality 800


in the last case the
wdc0:0:0: lost interrupt
        type: ata tc_bcount: 16384 tc_skip: 0
wd0a: device timeout reading fsbn 7107072 of 7107072-7107103 (wd0 bn 
7107135; cn 3470 tn 17 sn 31), retrying
wd0: soft error (corrected)

errors are back.


Is it safe to draw the conclusion that this is a bug in the kernel?


regards,

Theo