ntpq flakiness

Thu Apr 11 01:24:16 UTC 2019

Hal Murray <hmurray at megapathdsl.net>:
> 
> > It's one of the few times I've gone on an expedition like that and completely
> > failed.  Whatever it is, it's not going to be obvius. 
> 
> Here is an interesting possibility.  How about the code is working as designed 
> but the parameters are set wrong.  Maybe not "wrong".  How about "not 
> agressive enough for crappy conditions"?

It could be.  I hope so. I'd siure like to have that unknown off my mind.

> I think you said it did one retransmission after 5 seconds.  Can you easily 
> patch that to be 3 or adjustable from the command line?  It should double the 
> time each retry, but you can start lower if you collect a few samples to learn 
> what to expect.

That's this code just after line 250 of packet.py:

# Requests are automatically retried once, so total timeout with no
# response is a bit over 2 * DEFTIMEOUT, or 10 seconds.  At the other
# extreme, a request eliciting 32 packets of responses each for some
# reason nearly DEFSTIMEOUT seconds after the prior in that series,
# with a single packet dropped, would take around 32 * DEFSTIMEOUT, or
# 93 seconds to fail each of two times, or 186 seconds.
# Some commands involve a series of requests, such as "peers" and
# "mrulist", so the cumulative timeouts are even longer for those.
DEFTIMEOUT = 5000
DEFSTIMEOUT = 3000

I'd push a trial patch to the repo, but you can try that mod locally
just as easily and that wway we only change the commit history if it's a
good idea.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>