Apparent protocol-machine bug, new top priority

Achim Gratz Stromeko at
Sat Sep 15 18:43:14 UTC 2018

Eric S. Raymond via devel writes:
>>This is still happening with ntpsec-0.9.7+1104, albeit much less often
>>now (but I've removed iburst from the configuration files).  I just had
>>it happen while I was updating the rasPis.  Again, the symptom is that
>>there is a "rate_exceeded" event and the hpoll gets set to some high
>>value and never recovers from there even though the poll interval should
>>be fixed:
>>assoc=53276: conf, reach, sel_reject, 1 event, rate_exceeded
>>unreach=0 hmode=3 pmode=4 hpoll=10 ppoll=4 headway=7908 flash=4096 keyid=0
>>This was and is not happening with NTP classic.
> Now that iburst has been fixed - and Achim reports seeing this problem
> with iburst off - this pretty much has to be an issue deeper in the
> protocol machine.  (I guess we should count our blessings and
> congratulate Daniel that there haven't more of these since the big
> refactor.)
> If this is happening with iburst *off*, it becomes more difficult to
> understand how the rate limit is being triggered.  I think maybe we
> should start by focusing on something else: why is hpoll not
> recovering after a KOD?
> I'm thinking this sounds like some KOD-recovery logic got lost during
> the refactor.
> I also judge this is our new most serious bug. Daniel, would you give
> it a hard look, please?  You too, Hal - I'm thinking you have better
> odds of diagnosing this one than I do.

This is still in NTPsec.  I've looked around several times, but I don't
really see where this is happening, other than the fact that only a KOD
packet will set the hpoll directly to 10.  Where it's supposed to be
reset when a valid packet exchange happens after the freeze-out time I
have not yet found.  However, if the peer structure gets re-used in this
case, there never will be a chance to get things back, since the maxpoll
value has been overwritten with "10" as well.  If it wouldn't have done
that, then poll_update would re-clamp to maxpoll eventually.  That logic
was changed from NTP classic in d358d266f71b4a608c64ab034d96ecfdc482256d
by D. F. Franke.  The original code used the received poll value in the
packet to replace minpoll (which eventually would also end up shifting
up maxpoll if large enough), the new code uses a hard-coded value of 10.
At least that's how I understand it…  The fun thing is that the commit
message claims that "KoDs are handled more sensibly and will never bump
the polling interval up to anything ridiculous" (which to the contrary
is exactly what seems to be happenening _after_ that change).

+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Q+, Q and microQ:

More information about the devel mailing list