Apparent protocol-machine bug, new top priority
Eric S. Raymond
esr at thyrsus.com
Sun Aug 27 14:28:18 UTC 2017
I wrote:
>If this is happening with iburst *off*, it becomes more difficult to
>understand how the rate limit is being triggered. I think maybe we
>should start by focusing on something else: why is hpoll not
>recovering after a KOD?
>
>I'm thinking this sounds like some KOD-recovery logic got lost during
>the refactor.
Trying to trace how things go bad. Looks to me like this piece of
logic down around line 592, processing a KOD, sets minpoll high:
if(is_kod(pkt)) {
if(!memcmp(pkt->refid, "RATE", REFIDLEN)) {
peer->selbroken++;
report_event(PEVNT_RATE, peer, NULL);
if (peer->minpoll < 10) { peer->minpoll = 10; }
peer->burst = peer->retry = 0;
peer->throttle = (NTP_SHIFT + 1) * (1 << peer->minpoll);
poll_update(peer, 10);
}
return;
}
Then poll_update sets hpoll to 10. Achim seems to be reporting that
it stays stuck there. Now I look at this:
void
poll_update(
struct peer *peer, /* peer structure pointer */
uint8_t mpoll
)
{
unsigned long next, utemp;
uint8_t hpoll;
/*
* This routine figures out when the next poll should be sent.
* That turns out to be wickedly complicated. One problem is
* that sometimes the time for the next poll is in the past when
* the poll interval is reduced. We watch out for races here
* between the receive process and the poll process.
*
* Clamp the poll interval between minpoll and maxpoll.
*/
hpoll = max(min(peer->maxpoll, mpoll), peer->minpoll);
peer->hpoll = hpoll;
This means that hpoll can never be set lower than minpoll. Which means
there will never be any recovery from the KOD rate limit, no matter
what values poll_update() is called with, unless minpoll is lowered.
But this never happens.
ntp_peer.c:721: peer->minpoll = min(minpoll, NTP_MAXPOLL);
ntp_peer.c:724: peer->minpoll = peer->maxpoll;
ntp_proto.c:596: if (peer->minpoll < 10) { peer->minpoll = 10; }
refclock_jjy.c:2788: peer->minpoll = 8 ;
refclock_oncore.c:621: peer->minpoll = 4;
refclock_trimble.c:469: peer->minpoll = TRMB_MINPOLL;
The ntp_peer.c hits are during new-peer initialization. The refclock hits
are irrelevant, we're troubleshooting the code path for NTP peers. My
deduction is that ntp_proto.c:596 is probably wrong, it's disabling
the normal poll interval hysteresis (which I admit I only vaguely
understand).
But the problem may be deeper than that. The corresponding code in
Classic is this:
/*
* Check to see if this is a RATE Kiss Code
* Currently this kiss code will accept whatever poll
* rate that the server sends
*/
peer->ppoll = max(peer->minpoll, pkt->ppoll);
if (kissCode == RATEKISS) {
peer->selbroken++; /* Increment the KoD count */
report_event(PEVNT_RATE, peer, NULL);
if (pkt->ppoll > peer->minpoll)
peer->minpoll = peer->ppoll;
peer->burst = peer->retry = 0;
peer->throttle = (NTP_SHIFT + 1) * (1 << peer->minpoll);
poll_update(peer, pkt->ppoll);
return; /* kiss-o'-death */
}
I see that our line 596 is a replacement for allowing the KOD packet
to set the poll rate. That makes all kinds of sense, as a spoofed KOD
packet with a maliciously high poll interval is an obvious DoS
vector. (See, Daniel? I are learning to think like an InfoSec
paranoid.)
Unfortunately for this neat theory, the correwsponding grep hits in
Classic are:
ntp_peer.c:857: peer->minpoll = NTP_MINDPOLL;
ntp_peer.c:859: peer->minpoll = min(minpoll, NTP_MAXPOLL);
ntp_peer.c:865: peer->minpoll = peer->maxpoll;
ntp_proto.c:1589: peer->minpoll = peer->ppoll;
Again, the ntp_peer.c hits are during newpeer initialization. That
is, I can't find any way that minpoll recovers after a KOD in
Classic, either.
What am I misssing here?
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
Rifles, muskets, long-bows and hand-grenades are inherently democratic
weapons. A complex weapon makes the strong stronger, while a simple
weapon -- so long as there is no answer to it -- gives claws to the
weak.
-- George Orwell, "You and the Atom Bomb", 1945
More information about the devel
mailing list