Concerning the ntp-4.2.8p8 security fixes

Fri Jun 3 14:57:31 UTC 2016

As I suspected and Miroslav just confirmed
(http://bugs.ntp.org/show_bug.cgi?id=3044 comment #5, and in more
detail privately), the description of CVE-2016-4954 NTP.org's security
advisory is wrong. Here's how the vulnerability works and what an
attacker can do with it:

The receive() function runs a barrage of sanity checks on each
incoming packet; when one fails, it sets a flag named TEST# (where #
represents a numeral) in NTP Classic, which NTPsec renamed to BOGON#.
Among the most important of these checks is BOGON2, which is set when
there is a mismatch between the transmit timestamp originally sent
from the client and the origin timestamp received from the server.
When no cryptographic authentication is in use, this check becomes
NTP's primary defense against spoofing: since the low bits of transmit
timestamps are randomized, they should be hard for an off-path
attacker to guess. (Of course, an on-path attacker, aka a MITM, can
just read them off the wire -- in which case your defenses are reduced
to crypto or bust).

However, when receive() sets a bogon bit, it often doesn't return
immediately; instead, it continues running the rest of the checks so
that admins troubleshooting their NTP configuration can see
*everything* that was wrong with the packet, not just the first check
that failed. At least, that's original the intent; after decades of
patch accretion it's slowly gotten to be pretty inconsistent in how it
approaches this. Packets that fail MAC verification, for instance, get
discarded earlier. But anyway, the origin timestamp check is one of
those checks where it initially just sets a bit, and waits until the
very end to reject the packet if that bit is set.

The vulnerability is that in between the test and the rejection, some
state variables get updated. Therefore, an attacker can temporarily
manipulate these variables.

The most interesting among these variables is the leap indicator.
Suppose a client talks to three servers. An attacker sends spoofed
packets purporting to come from these servers, all with the leap
indicator set. These packets, lacking correct origin timestamps, will
get dropped before they make it to clock selection, but not until
after the leap indicator variable gets set. Then, a legitimate packet
arrives; its leap indicator is clear, but now when clock selection is
entered, the client sees two set leap indicators and one clear; that's
a quorum, so the leap timer gets armed to go off at the end of the
month. If the end of the month comes before the next polling interval
elapses, then the timer goes off and a leap second gets inserted into
the system clock.

A slightly less interesting attack involves similarly manipulating the
root dispersion field. Servers for which this field is too high will
be disfavored by NTP's clustering algorithm, allowing the attacker
some control over which servers survive filtering.

Anyway, although NTP.org blew this advisory, they did get the patch
correct, and as I reported in my previous email I've already ported
and pushed that patch as of yesterday morning. I'm on the fence as to
whether this bug is bad enough to merit tagging a release right away.
Both NTP.org and the Redhat folks who discovered the bug are
downplaying it, but I'm leaning toward yes given that even
*legitimate* leap seconds have a long history of creating ops havoc,
so a bogus one could be especially insidious.