Subject: port-alpha/24440: too long at high IPL/SPL in -current results in bad NTP clock skew
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <fair@netbsd.org>
List: netbsd-bugs
Date: 02/15/2004 16:48:34
>Number:         24440
>Category:       port-alpha
>Synopsis:       too long at high IPL/SPL in -current results in bad NTP clock skew
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-alpha-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 16 00:57:01 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Erik E. Fair
>Release:        NetBSD 1.6ZJ
>Organization:
International Organization of Internet Clock Watchers
>Environment:
	
	
System: NetBSD timex.clock.org 1.6ZJ NetBSD 1.6ZJ (MIATA-GL) #7: Sat Feb 14 12:15:13 PST 2004 root@timex.clock.org:/usr/obj/sys/arch/alpha/compile/MIATA-GL alpha
Architecture: alpha
Machine: alpha

Loaded initial symtab at 0xfffffc0000770048, strtab at 0xfffffc00007af078, # entries 10690
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.6ZJ (MIATA-GL) #7: Sat Feb 14 12:15:13 PST 2004
	root@timex.clock.org:/usr/obj/sys/arch/alpha/compile/MIATA-GL
Digital Personal WorkStation 600au, 598MHz, s/n 
8192 byte page size, 1 processor.
total memory = 1024 MB
(1896 KB reserved for PROM, 1022 MB used by NetBSD)
avail memory = 999 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-0
cpu0: VAX FP support, IEEE FP support, Primary Eligible
cpu0: Architecture extensions: 1<BWX>
cia0 at mainbus0: DECchip 2117x Core Logic Chipset (Pyxis), pass 1
cia0: extended capabilities: 1<BWEN>
cia0: using BWX for PCI config access
pci0 at cia0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
tlp0 at pci0 dev 3 function 0: DECchip 21143 Ethernet, pass 3.0
tlp0: interrupting at dec 550 irq 0
tlp0: DEC , Ethernet address 00:00:f8:76:21:19
nsphy0 at tlp0 phy 5: DP83840 10/100 media interface, rev. 1
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
tlp0: 10baseT, 10baseT-FDX, 10base2, 10base5
sio0 at pci0 dev 7 function 0: Contaq Microsystems 82C693 PCI-ISA Bridge (rev. 0x00)
cypide0 at pci0 dev 7 function 1
cypide0: Cypress 82C693 IDE Controller (rev. 0x00)
cypide0: bus-master DMA support present
cypide0: primary channel wired to compatibility mode
cypide0: primary channel interrupting at isa irq 14
atabus0 at cypide0 channel 0
cypide1 at pci0 dev 7 function 2
cypide1: Cypress 82C693 IDE Controller (rev. 0x00)
cypide1: hardware does not support DMA
cypide1: primary channel wired to compatibility mode
cypide1: secondary channel interrupting at isa irq 15
atabus1 at cypide1 channel 0
ohci0 at pci0 dev 7 function 3: Contaq Microsystems 82C693 PCI-ISA Bridge (rev. 0x00)
ohci0: interrupting at isa irq 10
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Contaq Microsys OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
vga0 at pci0 dev 11 function 0: ATI Technologies Radeon 7500 (rev. 0x00)
wsdisplay0 at vga0 (kbdmux ignored): console (80x25, vt100 emulation)
ppb0 at pci0 dev 20 function 0: Digital Equipment DECchip 21152 PCI-PCI Bridge (rev. 0x02)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
isp0 at pci1 dev 4 function 0: QLogic 1020 Fast Wide SCSI HBA
isp0: interrupting at dec 550 irq 3
scsibus0 at isp0: 16 targets, 8 luns per target
fwohci0 at pci1 dev 9 function 0: Lucent Technologies FW322/323 IEEE 1394 OHCI Controller (rev. 0x04)
fwohci0: interrupting at dec 550 irq 16
fwohci0: OHCI 1.0, 00:60:1d:00:00:02:13:3a, 400Mb/s, 2048 max_rec, 8 ir_ctx, 8 it_ctx
bktr0 at pci1 dev 10 function 0
bktr0: interrupting at dec 550 irq 20
bktr0: Pinnacle/Miro TV, Temic NTSC tuner, dbx stereo.
isa0 at sio0
lpt0 at isa0 port 0x3bc-0x3bf irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
alpha_shared_intr_establish: isa irq 3: warning: using edge-triggered on level-triggered
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 (mux ignored): console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 (mux ignored)
ess0 at isa0 port 0x220-0x22f irq 5 drq 1,5
ess0: ESS Technology ES1887 [version 0x688b]
ess0: audio1 interrupting at irq 5
ess0: audio2 polled
audio0 at ess0: full duplex, mmap, independent
opl0 at ess0: model OPL3
midi0 at opl0: ESS Yamaha OPL3
pcppi0 at isa0 port 0x61
midi1 at pcppi0: PC speaker
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
IPsec: Initialized Security Association Processing.
stray isa irq 15
stray isa irq 15
scsibus0: waiting 2 seconds for devices to settle...
stray isa irq 15
stray isa irq 15
atapibus0 at atabus1: 2 targets
fw0 at fwohci0: 00:60:1d:00:00:02:13:3a:0a:02:ff:ff:f0:01:00:00
cd0 at atapibus0 drive 0: <TOSHIBA CD-ROM XM-6202B, b\221\311\373\000, 1110> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2
cd0(cypide1:0:0): using PIO mode 4
sd0 at scsibus0 target 0 lun 0: <IBM, DMVS36V, 0250> disk fixed
sd0: 35003 MB, 11739 cyl, 20 head, 305 sec, 512 bytes/sect x 71687340 sectors
sd0: sync (50.00ns offset 8), 16-bit (40.000MB/s) transfers, tagged queueing
ustir0 at uhub0 port 2 configuration 1 interface 0
ustir0: Sigmatel Inc IrDA/USB Bridge, rev 1.10/0.08, addr 2
irframe0 at ustir0: SIR, MIR, FIR
root on sd0a dumps on sd0b
mountroot: trying nfs...
mountroot: trying msdos...
mountroot: trying cd9660...
mountroot: trying ffs...
readclock: 4/2/14/21/17/1=>1076793421 (1076793202)
root file system type: ffs
init: copying out path `/sbin/init' 11
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

% vmstat -i
interrupt                                     total     rate
soft net                                     389485        3
soft clock                                   111524        1
cpu0 clock                                101373190     1024
cpu0 device                                  776785        7
dec_550 irq 0                                725528        7
dec_550 irq 3                                 41445        0
dec_550 irq 16                                    3        0
isa irq 1                                      9772        0
isa irq 6                                         1        0
isa irq 10                                       28        0
isa irq 15                                        8        0
fwohci0 intr                                      2        0
Total                                     103427771     1044

>Description:
	This Alpha Personal Workstation (PWS) 600au (MIATA-GL, 600
	MHz 21164a EV56 CPU) used to keep good time. With an updated
	kernel, and being absolutely idle for 12 hours, the clock
	skew has gotten so bad that NTP said this:

Feb 14 13:25:09 timex ntpd[534]: ntpd 4.2.0-r Wed Feb  4 13:39:16 UTC 2004 (1)
Feb 14 13:25:10 timex ntpd[534]: precision = 2.000 usec
Feb 14 13:25:10 timex ntpd[534]: kernel time sync status 0040
Feb 14 13:25:10 timex ntpd[534]: frequency initialized 16.321 PPM from /var/db/ntp.drift
Feb 14 13:28:27 timex /netbsd: setclock: 4/2/14/21/28/27
Feb 14 13:28:27 timex ntpd[534]: time reset +0.239176 s
Feb 14 13:28:27 timex ntpd[534]: kernel time sync disabled 0041
Feb 14 13:34:52 timex ntpd[534]: kernel time sync enabled 0001
Feb 15 16:06:02 timex /netbsd: setclock: 4/2/16/0/6/2
Feb 15 16:06:02 timex ntpd[534]: time reset -0.372314 s
Feb 15 16:27:33 timex /netbsd: setclock: 4/2/16/0/27/33
Feb 15 16:27:33 timex ntpd[534]: time reset +1.016689 s
Feb 15 16:27:33 timex ntpd[534]: frequency error 512 PPM exceeds tolerance 500 PPM

>How-To-Repeat:
	Run ntpd on an Alpha running -current. Observe NTP whacking
	the ToD around more than it should given DEC's generally
	good clock hardware.

>Fix:
	Careful review of NetBSD/alpha's drivers and interrupt
	handlers, hopefully resulting in changes that minimize the
	time/instructions spent at high IPL/SPL. NTP is the canary
	in the coal mine here - such work will also carry forward
	in better I/O performance and system responsiveness generally.
>Release-Note:
>Audit-Trail:
>Unformatted: