kern/57920: hardclock(9) contract is unclear about missed ticks

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/57920: hardclock(9) contract is unclear about missed ticks
From: campbell+netbsd%mumble.net@localhost
Date: Sat, 10 Feb 2024 20:00:00 +0000 (UTC)

>Number:         57920
>Category:       kern
>Synopsis:       hardclock(9) contract is unclear about missed ticks
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 10 20:00:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The NetBSD Hardclock
>Environment:
>Description:
Quoth the hardclock(9) man page:

     The hardclock() function is called hz(9) times per second.  It implements
     the real-time system clock.  The argument frame is an opaque, machine-
     dependent structure that encapsulates the previous machine state.

What happens if the machine-dependent periodic timer interrupt is delayed or some timer interrupts have been missed, but the underlying timer hardware can tell by how much it has been delayed or how many interrupts are missed?

Reasons for this include entering and exiting ddb, suspending and resuming hardware, scheduling delays on virtual hardware, and flaky hardware

Here are some options if n > 1 periods have elapsed since the last hardclock tick:

1. Call hardclock once, i.e., pretend nothing happened and let the timecounter sort out clock jumps.
2. Call hardclock n times, i.e., try to catch up as fast as we can even if that means hardclocks happen much faster than 1/hz times per second.
3. Call hardclock MIN(n, k) times for some time k, i.e., try to catch up but by at most k/hz seconds.

Some drivers, like the i8254 driver in arch/x86/isa/clock.c and the Intel local APIC driver in arch/x86/x86/lapic.c, do (1); some drivers, like the PowerPC e500 clock driver in arch/powerpc/booke/e500_timer.c, do (2); other drivers, like the Xen clock driver in arch/xen/xen/xen_clock.c, do (3).  Which should it be?
>How-To-Repeat:
code inspection, diagnosing heartbeat issues with ddb on riscv, writing a new clock driver and wondering what to do in this case
>Fix:
Yes, please!

Perhaps hardclock(9) should be extended with an argument saying how many ticks the MD clock driver thinks have elapsed; if >1, it missed some.  We can have the policy about what to do in this case -- dtrace probe, event counter, printf, callout scheduling, whatever -- in MI code, and leave only the mechanism for detecting missed ticks in MD code.

Prev by Date: PR/57918 CVS commit: src/usr.bin/kdump
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: PR/57918 CVS commit: src/usr.bin/kdump
Next by Thread: bin/57921: swapon: Do not overwrite Linux swap header
Indexes:

Home | Main Index | Thread Index | Old Index