Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

re: mysql sleep timeouts



On Sep 30, 13:22, matthew green wrote:
} John Nemeth writes:
} >      I'm seeing a problem in MySQL Cluster where sleep timeouts
} > are running too long.  I don't know if this problem also affects
} > regular MySQL, but since they share a lot of code, it is very
} > possible.
} >
} >      After doing a fair bit of digging, I found a function called,
} > NdbSleep_MilliSleep().  The primary line in this function is,
} > "select(0, nullptr, nullptr, nullptr, &t);".  t is a "struct
} > timeval".  I'm guessing that select() timeout doesn't provide
} > millisecond level granularity?  Somebody can confirm.  What would
} > be a better option (hopefully reasonably portable)?
} 
} what's "too long"?  note that you can't get sleeps with higher
} resolution than hz currently, ie, default of 10ms, so if you're
} seeing 10ms instead of 1ms, the only current workaround is to
} run with HZ=1000 kernels.

     A sampling of log messages (these repeat many times):

2024-05-26 21:58:51 [ndbd] INFO     -- Watchdog: User time: 10705  System time: 2339
2024-05-26 21:58:51 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: JobHandling in block: 0, gsn: 0 elapsed=1654
2024-05-26 21:58:51 [ndbd] INFO     -- Watchdog: User time: 10705  System time: 2339
2024-05-26 21:58:51 [ndbd] WARNING  -- Time moved forward with 1678 ms
2024-05-26 21:58:51 [ndbd] WARNING  -- timerHandlingLab, expected 10ms sleep, not scheduled for: 1682 (ms)
2024-05-26 21:58:51 [ndbd] INFO     -- Bursty environment, mean burstiness of 92 pct, some risk of congestion issues
2024-05-26 21:58:53 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: JobHandling in block: 0, gsn: 0 elapsed=109
2024-05-26 21:58:53 [ndbd] INFO     -- Watchdog: User time: 10782  System time: 2347
2024-05-26 21:58:53 [ndbd] INFO     -- timerHandlingLab, expected 10ms sleep, not scheduled for: 249 (ms)
2024-05-26 21:58:58 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: JobHandling in block: 0, gsn: 0 elapsed=200

} when we have better timers available, the above method should
} work fine -- select() passes microsecond precision we'd only have
} to look it up to the future timer system.
} 
} (alternatively, if you _need_ this level of precision now, the
} only real way is to hard-spin until time passes.)

      How would one do that from userland?

}-- End of excerpt from matthew green


Home | Main Index | Thread Index | Old Index